Skip navigation links
Tools
SDKs
Libraries
Samples & Demos
Docs
Zones
Community
Support
Performance Optimization of 64-bit Windows Applications for AMD Athlon™ 64 and AMD Opteron™ Processors using Microsoft Visual Studio 2005 
Skip Navigation LinksHome
Michael Wall, Senior Member of Technical Staff, Advanced Micro Devices Inc.  10/11/2005 

We've tried a couple of optimization techniques from our basic bag-o-tricks, and obtained good results just by guessing. What next? How can we get even more performance? At this point, we really need to understand the details of where the code is spending its time. It's time to profile the code and measure what is really happening.

A profiler is an essential tool for any performance-oriented development project. It allows you to view details of the program execution. Timer-based profiling shows where the code spends time. Event-based profiling counts interesting hardware events, like cache refills and mispredicted branches.

AMD CodeAnalyst does all this, and more. Download CodeAnalyst from AMD Developer Central, and install it. Run CodeAnalyst, and read through the step-by-step tutorial under the Help menu. (notice that CodeAnalyst requires you to choose the VS2005 compiler setting /Zi and linker setting /DEBUG to create proper symbol info)

Now create a new CodeAnalyst project for the Mandel application:

Project directory: some temporary directory, like C: emp

Project name: any name you like, let's use "mvec"

Working directory: the location of the .exe, C:mandelx64 elease

Launch app: select mandel.exe in the working directory

Use the default "Timer Trigger," set the duration to 20 seconds, and check the "Terminate app?" box. Then click OK. We have set CodeAnalyst to launch the Mandel program, capture 20 seconds of timer-based profile data, then stop the program.

Click the triangular "Start" button, and the application should launch. Be sure to keep clicking the mouse and zooming in to the Mandelbrot set, so the main loop code is being exercised constantly. This will give relevant sample data, instead of measuring the idle loop.

After 20 seconds, Mandel will exit and CodeAnalyst will open a Session window with all your Timer Based Profile (TBP) sample data.

Click the System Graph tab, and you should see Mandel.exe taking the lion's share of the time. You can double-click the bars and drill down to the assembly code, with timer counts associated with the individual instructions.

Now click the System Data tab. The modules are ranked according to activity, with the most active module at the top. Presumably, this will be Mandel.exe if you kept the program busy while it ran.

Double-click mandel.exe, and you will see which functions inside Mandel.exe were the most active. The mandel function (main function) should be the most active, by far. Double-click it, and you will see the final destination: the source, ASM code, and corresponding sample counts. There is also a module navigator section showing your current position in the module, and a top-level graphical overview of the hot spots in the module.

figure 3

click to enlarge

This view shows a lot of useful info. Scroll through the code, and see where most of the time is spent. Click the little square box to expand a source line, and see the corresponding ASM code. Source and ASM don't always line up perfectly, but it's close. Note: a slow instruction will generally produce a large sample count on the next instruction, since that is the IP address where the actual measurement gets taken.

Notice that the vector calculations like MULPS and ADDPS take a lot of time. This is expected. However, you can also see the SHUFPS/COMISS taking lots of time. This might be a surprise, but this kind of overhead for rearranging data is an example of the tradeoffs often encountered when writing vectorized code. How can it be improved? [note: if you're not really interested in vectorization, skip ahead to the dual-core section]

Back to top
«2 3 4 5 6 7 8 9 10 11 12 14 15 16 17 18 »
2010 Advanced Micro Devices, Inc. AMD, the AMD Arrow logo, AMD Opteron, AMD Athlon, AMD Turion, AMD Sempron, AMD Phenom, ATI Radeon, Catalyst, AMD LIVE!, and combinations thereof, are trademarks of Advanced Micro Devices, Inc. Microsoft and Windows are registered trademarks of Microsoft Corporation in the United States and/or other jurisdictions. Linux is a registered trademark of Linus Torvalds. Other names are for informational purposes only and may be trademarks of their respective owners.

This website may be linked to other websites which are not in the control of and are not maintained by AMD. AMD is not responsible for the content of those sites. AMD provides these links to you only as a convenience, and the inclusion of any link to such sites does not imply endorsement by AMD of those sites. AMD reserves the right to terminate any link or linking program at any time.
Printer Friendly Version
Table Of Contents