Skip navigation links
Tools
SDKs
Libraries
Samples & Demos
Docs
Zones
Community
Support
Develop Blazing Fast Code with Microsoft Visual Studio® 2008 and AMD Tools 
Skip Navigation LinksHome > Docs > Articles & Whitepapers
Who wants good-enough performance when Orcas-based tools and libraries from AMD can give you gear-stripping, road-scorching code?
Anderson Bailey, updates by AMD  1/13/2009 

Originally published 9/7/2007. Revised 1/13/2009

» Overview
» Working With Visual Studio 2008
» Plug-in to AMD CodeAnalyst™ Performance Analyzer for Windows
» Easily Integrate AMD Performance Library
» Going Forward
» Resources

Overview

Now that “Quad-Core AMD Opteron” and “Shanghai”, the Quad-Core AMD Opteron™ processors, are setting new levels of performance for x86-64 processors, you're likely to want to take advantage of every one of the many architectural innovations to boost your code beyond anything previously possible.

AMD, which has long maintained a deep inventory of developer tools and resources, has several key pieces of software to make burning performance possible.

This article looks at several useful tools and resources, all of which have the distinct advantage of being freely available. (The Resources section at the end of this article provides all the needed URLs.)

Working With Visual Studio 2008

Microsoft Visual Studio 2008 is the latest released IDE for C/C++ on Windows® and for development using Microsoft's .NET languages, especially C# and Visual Basic.

Switches in Visual Studio 2008 provide numerous options for optimizing C and C++ code for Quad-Core AMD Opteron processors. AMD has specific suggestions on compiler switch settings for maximum performance (some of which apply to Athlon™ 64 processors as well.)

The basic recommended settings at compilation for maximum performance on Quad-Core AMD Opteron processors are:

/O2 Optimize for maximum speed (see note below).
/GL Whole-program optimization, especially interprocedural optimizations.
/fp:fast Fast floating point: use if the differences between this and IEEE floating-point calculations are acceptable (uses arithmetic approximations). In many cases, they are acceptable, so this is generally a good switch to keep set.

Visual Studio enables profile-guided optimization (PGO), which optimizes binaries based on program usage. This option should be used in conjunction with the linker /LTCG:PGI switch to make sure the linker knows what is happening. Train the program that has been linked with this option by running it several times in real-world scenarios, so that it can be instrumented. Then relink with the /LTCG:PGO switch to have the linker re-optimize the link order and segment positioning to favor the usage pattern represented by the instrumentation results.

As to which platform architecture to emit code for, choose the /favor:blend option for the most advantageous mix of instructions when compiling for Quad-Core AMD Opteron processors. (The /favor:AMD64 switch was designed for AMD platforms prior to AMD “Barcelona” )

Visual Studio defaults to “use x87 for” floating-point math. You should use at least the /arch:SSE2 option for Athlon 64 processors and later.

For 32-bit code that will be running on 64-bit Windows (under WOW64), consider setting the /LARGEADDRESSAWARE switch. This enables a 32-bit process to use all 4GB of Virtual Address space, instead of only 2GB, which is great for those 32-bit applications that are address space constrained.

Because of the cache architecture on Quad-Core AMD Opteron processors, it might turn out that the /O1 switch (optimize for size) rather than /O2 (for speed, as recommended above) will generate better results. Test code compiled both ways and see which works best for your program.

Some Quad-Core AMD Opteron processor optimizations are available with no specific action required by developers. This is due in part to close collaboration between AMD and Microsoft engineers. AMD actually has a team of engineers working closely with Microsoft to work on tuning compilers (including the Visual Studio 2008 compiler tools) for best performance on AMD processors while also maintaining the performance of other x86 chips. As a result of this collaboration, software developers will enjoy improved instruction selection, optimized register allocation, and enhanced 128-bit floating-point performance.

To get the fastest performance on Windows and Vista , Visual Studio 2008 is the way to go and the AMD contribution is a big part of the reason.

AMD's performance toolbox, however, has other resources that are adapted to the Visual Studio 2008 release, including highly optimized libraries, and a software performance analyzer.

Plug-in to AMD CodeAnalyst™ Performance Analyzer for Windows

After you've written your blazing code, you will almost certainly find hot spots from which you'd like to coax better performance. Perhaps the start-up delay or the frame rates are sluggish. A tool you will need to track down these types of issues is a highly granular code analyzer, such as AMD's CodeAnalyst Performance Analyzer. At its core, CodeAnalyst is a profiling tool with a detailed GUI interface, and is also a Visual Studio plug-in for native code. It does time-based profiling, event-based profiling (capturing any of the 104 processor events available on AMD chips with the option of multiplexing between them). Moreover, it can profile on a system-wide basis. Kernel code and DLLs can be included in the sampling, in addition, of course, to the application's own code.

Performance can be measured on a per-thread basis, with analysis of non-local memory access for analysis of NUMA designs on AMD platforms.

CodeAnalyst bundles a tool that is not available anywhere else: a true x86-64 pipeline simulator. This resource is especially useful for optimizing of small tight loops, where assembly-language coding is indicated.  This feature is only available in the full stand-alone version of CodeAnalyst.

(Note that if these loops are part of imaging functions or codecs, it would be better to use the Framewave Library, discussed next, than to use assembly routines. The library wraps the assembly language implementation in a C-callable function, so that new versions of the library can deliver the fast leverage of new processor features without requiring you to change your code.)

One of the difficult problems with assembly language work on x86-64 processors today is that because so many instructions are executed simultaneously, it's very difficult to have an exact measure of the performance of a given sequence of instructions. Without this knowledge, low-level optimization is nearly impossible. The pipeline simulator solves this problem by enabling developers to see the instructions proceeding through the processor pipeline and being executed. Hence, fine grained tuning is possible. The simulator works on 32- and 64-bit code.

CodeAnalyst plugs into Visual Studio 2008 and is a valuable tool that should be part of your armory.

Easily Integrate the Framewave Library

Framewave is an open-source, large collection of C/C++ callable functions that span a wide range of multimedia and imaging needs. It is a library that runs on all AMD and x86 processors and provides an unchanging API set, even as the underlying implementations are changed to take advantage of the latest features of the silicon. With the released version 1.3 of the library, the Quad-Core AMD Opteron processor’s software visible features (see the Shanghai Zone for more details on the software visible features).

Framewave has historically focused primarily on imaging and signal-processing functions. The routines in v. 1.3 of the library (released in December 2008) are grouped into:

  • Data exchange and initialization: functions that initialize data buffers, copy data between buffers, convert data types, and scale image data.
  • Arithmetic and logic functions: math functions that relate to image processing.
  • General Color conversion functions and Color space conversion functions using 3D look up with tri-linear interpolation (3D-LUT functions)Threshold and compare functions: These functions essentially compare image data and, based on the comparison results, perform different manipulations.
  • Geometric transforms: functions to warp, shear, resize, mirror, and rotate images.
  • Digital filter functions: intended for altering frequency-related visual properties of images.
  • JPEG encoding and decoding functions.
  • Autocorrelation
  • H.264, MPEG-1 and MPEG-2 decoders.
  • Miscellaneous support routines.

When run on AMD Quad-Core AMD Opteron processors, Framewave leverages features such as SSE4a instructions, and SSE128 for greatly increased SIMD throughput. By using the library rather than directly coding these features, developers obtain the benefits without having to change code. In addition, Framewave is internally multithreaded, taking advantage of multiple processor cores without complex involvement by the developer.

Framewave is available for Windows (32- and 64-bit versions) as a DLL and static libraries can be built from the source compatible with Microsoft Visual Studio.

Note: developers who want a similar library of high-speed functions for linear algebra (BLAS and LAPACK), FFTs, transcendental math, and random-number generation should consider AMD's Core Math Library (ACML), which is also available at no cost.

Going Forward

AMD is pushing for advances in other areas to help developers get the best possible performance. One of these, the Lightweight Profiling (LWP) proposal, solves an important problem in measuring performance. The problem faced by all performance profilers is the observer effect: By using software to measure the performance of the processor, the processor's performance is changed. The LWP proposal is the first under AMD's Hardware Extensions for Software Parallelism and calls for a set of instructions that work with a dedicated portion of x86 silicon to store data about processor operations. The LWP instructions read this data (and also turn profiling on and off). By having lightweight profiling in silicon rather than in software, more accurate data can be returned. This has obvious implications for straight-ahead tools such as profilers and even some debuggers. But it also has benefits in other key areas: For example, execution frameworks such as Java and .NET could monitor code performance and optimize JIT code on the fly based on LWP-generated data.

The LWP proposal addresses an important problem. It shows AMD's continued efforts to advance market leadership in performance-oriented computing. Should the proposal come to fruition, we can expect the technology to be leveraged by most significant tools and IDEs, potentially including future releases of Visual Studio and the .NET Framework.

Resources

In Sum

The combination of Microsoft Visual Studio 2008, the AMD CodeAnalyst Performance Analyzer, and the Framewave and ACML libraries provide optimal code generation for x86-64 processors inside a truly integrated package. And future plans are likely to add even more performance tools to this package. To begin enjoying greater performance, use the links that follow to obtain tools and additional useful information.

Resources

Anderson Bailey is a developer with a longstanding interest in the techniques for using code to exploit processor features. He can be reached at chip.coder@gmail.com.

Back to top
� 2010 Advanced Micro Devices, Inc. AMD, the AMD Arrow logo, AMD Opteron, AMD Athlon, AMD Turion, AMD Sempron, AMD Phenom, ATI Radeon, Catalyst, AMD LIVE!, and combinations thereof, are trademarks of Advanced Micro Devices, Inc. Microsoft and Windows are registered trademarks of Microsoft Corporation in the United States and/or other jurisdictions. Linux is a registered trademark of Linus Torvalds. Other names are for informational purposes only and may be trademarks of their respective owners.

This website may be linked to other websites which are not in the control of and are not maintained by AMD. AMD is not responsible for the content of those sites. AMD provides these links to you only as a convenience, and the inclusion of any link to such sites does not imply endorsement by AMD of those sites. AMD reserves the right to terminate any link or linking program at any time.