To illustrate just how the software ecosystem enables powerful gains in the floating point performance so critical to HPC, here are two examples of software tuned for Quad-Core AMD Opteron™ processors, and exactly how to put them together.
The example applications are:
1. HPL – the High Performance Linpack TPP Benchmark, which solves a linear system of equations and measures the floating point rate of execution. We’ll use this benchmark to illustrate the power of Quad-Core AMD Opteron processors combined with ACML.
2. POP – the Parallel Ocean Program from Los Alamos National Laboratories, which models oceanic current flow. This software is a good way to demonstrate the effects of compiler flag mining on application performance.
Before we discuss the specifics of these examples, let’s define what constitutes an HPC stack. On the software side, fundamentally needed are an operating system and developer tools, starting with compilers and then performance libraries for floating point computation. It is common for HPC applications to use interprocess communication across a network, the Message Passing Interface (MPI) being the standard programming model. On the hardware side, required are one to many AMD processor-based systems and network interconnects (i. e. NICs, switches, or high-performance fabrics). In the case of these two examples, the application software for both these examples are public and easily obtainable.
See the diagrams below. Figure 1 shows a software and hardware stack for HPC. This is a broadly generalized illustration of the scope and breadth of an HPC infrastructure. Figure 2 is a simplified view that shows the components used with our example applications.
Figure 1: HPC Solution Stack – The dashed-line boxes indicate “optional” or “non-critical” components for our particular examples. Job management tools or elegant disk solutions are not being used in these sample performance illustrations.
A key advantage of this solution stack is the optimization work that has gone into the software ecosystem. AMD’s many contributions to the GCC compiler and other GNU projects, partnership with Portland Group International (PGI), and support of ongoing operating system improvements with our OS partners, as well as our own ACML offering, have resulted in better high-performance software building blocks. We continue to provide optimization training support to our partners as part of our wide-ranging collaboration.

Figure 2: HPC Solution Stack for the Example Applications (i. e. HPL & POP) –Note that POP does not directly use a BLAS and therefore ACML 4. 0. 1 is not relevant to the POP example shown below.
In Figure 3 and Figure 4 below, the direct impact of using the AMD software ecosystem in sample performance data is shown. In the more in-depth Tutorials, more details and the associated performance implications are discussed.
Figure 3: Raw HPL performance data using ACML Version 4. 0. 1 – Also using OpenMPI 1. 2. 5; Sample data for HPC software development consideration only, not meant for competitive analysis
With ACML 4. 0. 1 applied to the High Performance Linpack example built using GCC 4. 1. 2 and Open MPI 1. 2. 5, a 2P system outperforms a 4P system by 64%! Note that each system has 8 computing cores. The software is identical on both systems; it is ACML 4. 0. 1 that enables the 64% performance uplift along with Quad-Core AMD Opteron processors.

Figure 4: Raw Parallel Ocean Program data with various compiler flags using GCC 4. 1. 2 on 2P Quad-Core AMD Opteron™ Processor Model 2356; Sample data for HPC Software development consideration, not meant for competitive analysis
AMD values and thus directly contributes to the development and evolution of the GCC compiler. The performance improvement of the Parallel Ocean Program using the –O3 compiler flag and the larger set of flags (see flags listed in Figure 4 above) versus the baseline of the –O2 flag (GCC’s default optimization level) is 253% and 299% respectively. This shows the result of a flag mining exercise that is discussed in the Tutorial section.
The AMD ecosystem of software tools and resources to optimize HPC application development for AMD processors and underlying platforms provides key advantages that shouldn’t be overlooked. Read on for a wide range of information covering general parallel programming basics to step-by-step instructions for best practices in HPC programming.