A1. The biggest gains typically come from performance libraries. For math performance libraries, especially for scientific computing (think BLAS and RNGs), you need ACML Version 4 or higher. AMD provides a variety of ACML binaries compatible with different compilers and OSes.
»Learn more about ACML
A2. Another AMD-sponsored performance library is Framewave, which is an open source library for signal and image processing. This was originally authored by AMD and was contributed to open source.
»Framewave product page »Framewave open source project |
A1. First, use more aggressive compiler flags above the default optimization level, for example "-O3" (PGI, Sun Studio, PathScale, GCC) or "-O2" (Visual Studio 2008). Also look for newer compilers that are tuned for the new microarchitecture; AMD has collaborated with different compiler vendors on tuning for AMD Family 10h.
The typical procedure with optimization is to mine flags that best suit your hardware and software combination. As shown with the above flag mining exercise, the single largest jump with GCC occurred with " O3," above the default optimization level of "-O2."
A2. AMD provides coding tips for several different compiler environments which include information targeting Quad-Core AMD Opteron processors.
In addition, AMD maintains and periodically updates software optimization guides for AMD processors, as well as a compiler usage guide which rolls up performance recommendations and suggested compiler flags on a variety of compilers.
A3. Take advantage of multi-core; this means multi-threading in some way, either with explicit threading, runtime libraries such as MPI, or use compiler-based parallelization with auto-parallelization or OpenMP. |
A. Something has changed. Generally, in a simple setup, if you do things in the same order, you can expect reasonable consistency. However, there are things not easily under your control, such as the operating system and associated add-ins. A classic example is indexing software that runs at periodic intervals to support fast file searches. If it wakes up at the wrong time it will have an impact. Another example is the file system; Linux supports a variety of file systems and each one has different performance characteristics and memory footprints. A common recommendation for some file systems is to disable journaling options. There can also be subtleties in how the OS memory manager works to reclaim memory. If after a test sequence it doesn't quite make it back to the beginning state, you could impact performance. Summary:
Understand the impact of any background tasks and think about shutting down unneeded services.
On Linux, look at /proc/meminfo and review it for any deltas over time. Look at MemFree, Buffers, Active and any other values with large deltas. You could also think whether you are paging to disk in which case you can glance at /proc/swaps for any hints.
See if you have differences with performance runs right after reboot, versus possible deterioration over time. |
|
A. This is very important and it is an important investment. A bare minimum standard nowadays is Gigabit Ethernet. Supposing you were to use 100 Mbps switches instead of 1 Gbe on a dual-workstation config, as in the Unifex+Rogatien case above with GCC and ACML Version 4. The peak HPL performance in HPC Challenge would drop from 26.97 Gflops to 19.4 Gflops; about a 28% drop. The performance difference with POP might be about a 25% drop on the same two systems. This very simple example should give you a feel for how the network infrastructure can significantly impact MPI interprocess communication and overall performance. |
|
|
|
|
|
|