Now that the ACML 4.3.0 release is completed and posted live on AMD Developer Central, I’ve been spending time collecting all the performance data needed to document the improvements in the 4.3.0 release. There are several new features that should show up nicely in performance graphs. Improvements include a new SGEMM kernel for AMD Family 10h, new DGEMM and SGEMM for Woodcrest, Penryn, and Nehalem Intel processors, improved level 1 BLAS kernels, 3D FFT work, and new scalar acml_mv functions. It’s a really long list!
You can easily demonstrate these new performance features by using the examples in the performance directory of the ACML installation. There are examples for a few different routines, and these can be easily modified to demonstrate other routines as well.
A couple of trends are jumping out from the data collected so far. First, the 4.3.0 Level 3 blas routines run much better than previous versions on Intel machines. It is very competitive with MKL on Intel processors!
Second, the Intel Nehalem is a very impressive processor. However Istanbul’s 6 cores can crank out a bunch of raw DGEMM flops. This graph tells the story:
More information on ACML 4.3.0 is available on the ACML home page. If you have feedback on how the new release improves performance for your application, we’d love to hear about it.
This post is the opinion of the author and may not represent AMD’s positions, strategies or opinions. Links to third party sites and references to third party trademarks are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.