Introduction to ACML_MV: Fast Math and Fast Vector Math Library
ACML_MV is a library which contains fast and/or vectorized versions of some familiar math library routines such as sin, cos and exp. The routines take advantage of the AMD64 architecture for performance, and so are currently only available with 64-bit versions of ACML. The routines in the library are very accurate over the range of acceptable input arguments.
Some of the performance is gained by sacrificing error handling or the acceptance of certain arguments. It is therefore the responsibility of the caller of these routines to ensure that their arguments are suitable. Furthermore, some of the routines are not callable from high-level languages at all, but must be called via assembly language; see the documentation of individual routines for details. Hence, these routines are intended to be utilized by knowledgeable users only.
The individual documentation for a routine states what outputs will be returned for special arguments, and also gives an indication of performance of the routine. In general, special case arguments for any routine will cause a return value in accordance with the C99 language standard .
Special case arguments include NaNs and infinities, as defined by the IEEE arithmetic standard . In these documents, NaN means Not a Number, QNaN means Quiet NaN, and SNaN means Signalling NaN.
A denormal number is a number which is very tiny (close to the machine arithmetic underflow threshold) and is stored to less precision than a normal number. Due to their special nature, operations on such numbers are often very slow. While such numbers might not necessarily be regarded as special case arguments, for the sake of performance some of the ACML_MV routines have been designed not to handle them. This has been noted in the documentation for each ACML_MV routine.
Accuracy of a routine is quoted in ulps, where ulp stands for Unit in the Last Place. Since floating-point numbers on a computer are limited precision approximations of mathematical numbers, not all real numbers can be represented by machine numbers, and the machine number must in general be rounded to available precision. An ulp is the distance between the two machine numbers that bracket a real number.
In this document, the ulp is used as a measure of the error in a returned result when compared with the mathematically exact expected result. Because of the finite nature of machine arithmetic, a routine can never in general achieve accuracy of better than 0.5 ulps, and an accuracy of less than 1 ulp is good.
Some of the functions in ACML_MV include a weak alias to an equivalent function in libm. For example, the fastcos function includes a weak alias to cos. If ACML_MV is included in the link order before libm, then all calls to the aliased libm function name (e.g. cos) will use the equivalent ACML_MV routine (e.g. fastcos). If ACML_MV is included in the link order after libm, then all calls to libm functions will use the libm versions.
ACML_MV routines can always be accessed using their ACML_MV names (e.g. fastcos), regardless of link order.
The following types are used to describe the functions contained in this chapter:
- __m128d a pair of double precision values
- __m128 four single precision values