Open Source Strikes Again: Accelerated Math Libraries at AMD


Over the course of the past month or two, you may have seen a series of articles from our engineers on open source libraries. These libraries are aimed at accelerating math calculations for high performance computing, using GPUs and OpenCL™. You can read the blogs here.

AMD Compute Libraries

These point releases were not random. There is a method to our madness. We have just announced the AMD Compute Libraries (ACL) as an open Beta. You can learn more about the Beta 1 of ACL here. All the source code, readme files, and documentation are available in GitHub. Each of the clMath libraries we wrote about is a component of ACL.

But wait, there’s more!

I have told you elsewhere about AMD’s commitment to open source software. ACL provides developers with a growing collection of open source libraries designed to enhance the performance of your code, whether it is running on a GPU or on a CPU.

As part of this ongoing effort, on the CPU-side we are transitioning the AMD math libraries from the proprietary, closed source ACML codebase to the open source, BSD-licensed SHPC libraries. The Science of High-Performance Computing group, based at The University of Texas at Austin, developed and maintains the SHPC libraries.

That’s right, AMD’s formerly proprietary high-performance math library code is moving to open source.

The BLIS framework provides a high-performance BLAS-like API that replaces the BLAS API provided by ACML. The libflame library also provides a high-performance LAPACK-like API that replaces the LAPACK API provided by ACML.


The benefits are enormous, but to name a few:

  • AMD-provided math libraries are now available under either Apache or BSD licenses.
  • Removed the inconvenient dependencies on the Fortran runtime.  The source for both BLIS and libflame is written in C, or in assembly where performance requires.
  • Performance that matches or exceeds the ACML library. (See Figure 1)
  • An expanded API that is a strict superset of the traditional BLAS and LAPACK libraries.  BLIS and libflame provide a wrapper for programs written to the older ACML API.
  • A native API that offers the potential for greater acceleration, for programs willing to interface with the native API.

Optimizations for the latest 6th generation AMD A-series processors (code-named “Carrizo”), have been checked into the master branch of the BLIS GitHub repository.


Figure 1 shows a performance comparison between BLIS and ACML, measured in gigaflops. We used this configuration:

  • Processor: FX-8800P (code name “Carrizo”)
  • Operating system: Ubuntu Linux®04
  • BLIS v0.1.8
  • ACML v 6.1
Fig1: sgemm performance of BLIS compared to ACML over a range of matrix sizes

The “Carrizo peak” line is the theoretical maximum performance measured in gigaflops on the processor. The results show how the move to BLIS has caused no loss in overall performance.

AMD will continue working with the Science of High-Performance Computing group to bring further functional and optimization improvements for AMD hardware in releases to come.

For those wishing to get started with the BLIS framework, an overview and getting started guide are available at The SHPC software stack was funded, in part, by the National Science Foundation’s Software Infrastructure for Sustained Innovation.

Jim Trudeau is Senior Manager for Developer Outreach at AMD. Links to third party sites and references to third party trademarks, are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.

OpenCL is a trademark of Apple Inc. used by permission by Khronos.


5 Responses

  1. B

    BLIS has been around for some time, is AMD/will AMD be involved in development and optimization of the code (supposedly/especially the kernels for AMD *PUs) through contribution of material/financial/intellectual resources? AFAIK inspiration has been taken from the OpenBLAS project for Piledriver microkernels in BLIS, is AMD going to contribute to further optimization of the code? And for users of the current BLIS a table of optimal environment variable settings for various problems would be interesting,as one of the interesting featurs of BLIS is the flexibility of the threading within the library. (

    • jtrudeau

      The short answer is yes, to both questions.. We have contributed code to BLIS and expect that will continue.

      I have passed your comment on the environment variable settings to the team. Thank you for the feedback.

  2. notzed

    Well done on a smart decision.

    Pleased to see also that it is a C driver – aids cross platform/language use (ms/android/linux, c/java in my case).