AMD LibM is a software library containing a collection of basic math functions optimized for x86-64 processor based machines. It provides many routines from the list of standard C99 math functions. AMD LibM is a C library, which users can link in to their applications to replace compiler-provided math functions. Generally, programmers access basic math functions through their compiler. But those who want better accuracy or performance than their compiler’s math functions can use this library to help improve their applications. Users can also take advantage of the vector functions in this library. The vector variants can be used to speed up loops and perform math operations on multiple elements conveniently.
AMD LibM 3.1 is the most current version of the library.
What is New in V3.1
- With FMA3 and FMA4 code paths there is a good performance improvements to scalar functions which include Exponential, Logarithmic and Trigonometric group of functions.
- Added FMA3 and FMA4 code paths to vector variants of Exponential, Logarithmic and Trigonometric group of functions.
- Dynamic dispatch mechanism, with no run-time overhead, to automatically select functions optimized code path for the latest AMD processors.
- Added float and double variants of sinpi, cospi and tanpi.
- Bug fixes
A few example programs are included to illustrate usage of AMD LibM functions.
There are 112 C99 functions in this library.
Here is a simple table that lists the function categories and the number of functions in each of them.
|Trigonometric – 19||Remainder – 6|
|Hyperbolic – 12||Manipulation – 10|
|Exp & Log – 30||Max & Min & Diff – 6|
|Power & Absolute – 12||Nearest integer – 20|
There are also 6 non-C99 functions. These functions are closely related to some of the C99 functions and are provided for convenience. Refer to AMD LibM Functions for a full list of functions.
List of functions optimized to take advantage of new AMD Opteron Family 15h processor instructions.
|Scalar functions AVX/FMA4/XOP code path,30 functions||cbrt, cbrtf, cos, cosf, exp10, exp10f, exp2, exp2f, exp, expf, expm1, expm1f, fma, fmaf, log10, log10f, log1p, log1pf, log2, log2f, log, logf, pow, powf, sin, sincos, sincosf, sinf, tan, tanf|
|Vector functions AVX/FMA4/XOP code path 46 functions||vrd2_cbrt, vrd2_cos, vrd2_exp10, vrd2_exp2, vrd2_exp, vrd2_expm1, vrd2_log10, vrd2_log1p, vrd2_log2, vrd2_log, vrd2_sin, vrd2_tan, vrda_cbrt, vrda_cos, vrda_exp10, vrda_exp2, vrda_exp, vrda_expm1, vrda_log10, vrda_log1p, vrda_log2, vrda_log, vrda_sin, vrs4_cbrtf, vrs4_cosf, vrs4_exp10f, vrs4_exp2f, vrs4_expf, vrs4_expm1f, vrs4_log10f, vrs4_log1pf, vrs4_log2f, vrs4_logf, vrs4_sinf, vrs4_tanf, vrsa_cbrtf, vrsa_cosf, vrsa_exp10f, vrsa_exp2f, vrsa_expf, vrsa_expm1f, vrsa_log10f, vrsa_log1pf, vrsa_log2f, vrsa_logf, vrsa_sinf|
Accuracy & Performance
The accuracy of a math function is estimated in terms of maximum error (measured in ULP (unit in the last place)) between the obtained answer and the ideal infinite-precision answer, over that function’s range. The accuracy of AMD LibM functions in certain categories (absolute, nearest integer, remainder, manipulation, maximum, minimum, difference) is either 0 or 0.5 ULP. In these cases, the functions either produce exact answers or wherever applicable within the practical limits of correct rounding. In the remaining categories (trigonometric, hyperbolic, exponential, logarithmic, and power), the estimated accuracy is better than 1.0 ULP.
Many of the scalar and vector functions in AMD LibM are very well optimized for performance. Significant effort was put in optimizing the performance of trigonometric, exponential and logarithmic category of functions. Many functions in the power and remainder categories also have good optimizations. Applications that make significant use of math functions can benefit from this library.
LibM 3.1 has two optimized code paths, one is SSE2 optimized and the other is AVX+XOP+FMA4+FMA3 optimized for the new AMD Opteron Family 15h processor. Based on the features supported by the system processor, one of these paths is taken. The dispatch happens at library load time to ensure very little overhead in subsequent calls to the LibM functions.
- x86-64 processor based machine
- Linux 64 or Windows 64
- GCC 4.1.1 or later for Linux libraries
- Microsoft Visual Studio 2005 or later for Windows libraries
Note: The ‘acml_mv’, used to be a component of the ACML library. AMD LibM is an effort to provide those functions in a stand-alone library and also significantly expand on the number of functions. ACML no longer ships the acml_mv component, starting from the latest release ACML 5.0. These routines will be provided by LibM3.0, instead.