After approximately 2 months as a beta product, ACML 6 is releasing as a production SDK today! This is AMD’s first attempt to support heterogeneous computing in ACML — offloading significant computation to OpenCL™ devices without users having to change their source code. ACML 6 is leveraging the Open Source clMath libraries projects on the back end to provide the heterogeneous acceleration.
Several enhancements and bugfixes have been incorporated into the release since beta
- The FFTW wrapper library added support for heterogeneous offload of real transforms
- The FFTW wrapper library no longer requires that the FFTW CPU library to be installed, but it is up to the developer to make sure to call the AMD wrapper library only with supported clFFT features ( clFFT features documented on https://github.com/clMathLibraries/clFFT)
- Heterogeneous acceleration (offload to OpenCL device) has been added for two BLAS L2 routines: GEMV and SYMV
- Proper detection of OpenCL on Ubuntu based systems
- OpenCL kernel performance for BLAS routines has improved with the inclusion of clBLAS tuning databases
- The load balancing heuristics now examine how much dedicated memory is available on the OpenCL device, and will only offload the computation if the problem fits in the available memory
I would like to thank the forum users who downloaded the ACML 6 beta and tried it in their environments, giving us valuable feedback with their experiences. You can download the production version from the ACML download page.
Kent Knox is a Member of Technical Staff at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites, and references to third party trademarks, are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.