This page contains previous versions of ACML. Click the browser’s back button to see the current ACML versions. Note that documentation for previous releases can be found in the corresponding installation files. For Windows®, the package must be installed to access the documentation.
Go to Downloads
Release Notes
Features introduced with previous ACML releases:
- Version 5.3.0:
- Added FMA3 code paths for many BLAS and FFT routines.
- Updated the LAPACK code to version 3.4.0
- Improved performance for complex-complex out-of-place FFTs
- Added Fast Malloc to more BLAS routines.
- Fast Malloc is now enabled by default, with no need to set an environment variable.
- routines affected include *GEMM (except CGEMM), *SYR2K, *GEMV, *GER, *TRMV, and *TRSV.
- - Fast Malloc is only enabled for Linux
- Expanded coverage of the FFTW Wrapper examples to include double precision routines
- Added a set of FFTW Wrappers to assist in using ACML FFTs in applications written to use FFTW. These wrappers are provided as source code and are found in the ACML example directories. Examples are provided for a useful subset of the FFTW3 and FFTW2 routines
- Version 5.2.0:
- Improved performance for some common applications, using various optimizations
- Improved dgemm performance for small to medium problem sizes
- Fast malloc enabled in the single threaded library
- Fast malloc added to dtrsm. Note that the ACML_FAST_MALLOC environment variable must be set by the user to enable these optimizations.
- Changed openmp threading behavior in dtrsm, dpotrf, and dgetrf to limit the number of threads for small problems.
- Enabled better default FFT radix plans for specific HPCC problem sizes. When using HPCC with these problem sizes, the resulting FFT problem size will choose an optimal FFT radix plan, eliminating the need to use Mode100 planning. This dramatically improves performance
- “Run-anywhere” builds use fma4 instructions for key routines, based on CPUID feature bits. This allows these libraries to get reasonable performance on AMD Bulldozer CPUs, while still running properly on other processors
- Added a set of FFTW Wrappers to assist in using ACML FFTs in applications written to use FFTW. These wrappers are provided as source code and are found in the ACML example directories. Examples are provided for a useful subset of the FFTW3 and FFTW2 routines
- Version 5.1.0:
- In addition to SGEMM and DGEMM, CGEMM and ZGEMM have been tuned for AMD Family 15h processors.
- Real to Complex and Complex to Real FFTs (single and double precision) have been tuned for AMD Family 15 processors.
- Version 4.4.0:
- Performance of ZGEMM has been further improved. This performance improvement carries through to other Level 3 BLAS and LAPACK routines that call ZGEMM.
- Assembly language kernels used by the real-complex FFT routines csfft, dzfft, scfft and zdfft have been re-tuned for AMD Family 10h processors, providing significant performance increases.
- Version 3.6:
- LAPACK code update
- New OpenMP multithreading capability for many LAPACK routines
- Intel FORTRAN compatible Windows® 64 and Linux® 64 libraries
By checking this box, you agree to abide by the terms and conditions set forth in the end-user license agreement, above. If you do not agree to abide by these terms and conditions, you are not permitted to use the site or download materials from the site.
Note that the new End User License agreement supersedes the agreement found in previous ACML releases.
*Please note: Read EULA Agreement before downloading. If you are considering bundling ACML along with your products, you need a separate redistribution agreement. Refer to the ACML redistribution agreement page for more information.
Downloads