In October 2015, AMD released the AMD ACL 1.0 Beta 2, the second version of the AMD Compute Library (ACL), which provided important improvements in the clBLAS, clFFT, and clSPARSE libraries relative to the Beta 1 release. Since then, the team has worked hard to release even more improvements.
The GA release continues AMD’s goal of providing a unified repository for a variety of open-source math libraries that allow you to accelerate computations on AMD GPUs, APUs, and CPUs. All of the source code, readmes, and documentation are available at the respective GitHub links listed at the end of this post.
In the following sections, we’ll list the significant features in the GA, Beta 2, and Beta 1 releases.
The AutoGEMM functionality included in Beta 2 allowed users to automatically generate optimized kernels for various matrix sizes, but was restricted to “Hawaii”-based dGPUs. The GA version of clBLAS takes this feature to “Fiji”-based dGPUs—you can now generate optimized kernels for “Fiji”-based dGPUs.
The GA version of clBLAS also introduces a fix for multi-GPU and multi-context support. Earlier releases supported only an OpenCL™ context with an identical dGPU. The GA version fixes this such that it runs wells with systems with different dGPUs.
For context, see the features introduced in the clBLAS Beta 2 and Beta 1 versions at a glance.
- Introduced AutoGemm to automatically generate kernels for various matrix sizes
- Provided performance improvements in non-square GEMM computations, SGEMM at multiples of 1024 DGEMM at big sizes using up to 32 GB memory, and DTRSM algorithm
- Introduced Single/Double Precision GEMM kernels for Graphics Core Next (GCN) architecture that improved the performance of square matrix – square matrix multiplication
- Introduced offline kernel compilation mode
The Beta 2 version of clFFT included a pre-callback feature that enables faster custom pre-processing of input data directly by the library via a user callback function. The GA version goes a step further: it introduces a post-call back feature that enables faster custom post-processing of output data directly by the library via a user callback function. For more information about the post-call back feature, see this blog.
The GA version of clFFT increases the range of sizes supported for 1D in-place transforms while enabling really large-size 1D FFTs.
For context, see the features introduced in the clFFT Beta 2 and Beta 1 versions at a glance.
- Support for power-of-7 size transforms (radix 7)
- Pre-callback feature that enables faster custom pre-processing of input data directly by the library with user callback function
- Support for 1D large size transforms with no extra memory allocation for certain sizes.
- Performance improvements in power-of-2 size real transforms, and when enabling ECC
- General performance improvements in complex transforms over many sizes
The v0.10 version of clSPARSE introduces an abstraction for the bitness width of indices. This release incorporates API changes to increase library usability and readability, so please refer to the project release notes for details.
For context, see the features introduced in the clSPARSE Beta 2 and Beta 1 versions at a glance.
- Introduced single precision SpM-SpM (SpGEMM) function
- Optimizations to the sparse matrix conversion routines
- SpM-dV routines provide higher precision accuracy
The GA version of the clRNG library includes no changes from the Beta 1 version.
Beta 1 included the following features.
- Library demonstrating efficient generation of random numbers with OpenCL and GPU
- Three base generators: MRG31k3p, MRG32k3a and LFSR113
- Multiple streams of random numbers
- Host and device side interfaces for better device access
Have fun. Please provide your feedback at: https://community.amd.com/community/devgurus/amd-compute-libraries.
Karthik Dakshinamoorthy is the Program Manager for AMD Compute Libraries. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.
OpenCL is a trademark of Apple Inc. used by permission by Khronos.