Skip navigation links
Tools
SDKs
Libraries
Samples & Demos
Docs
Zones
Community
Support
Basic Performance Measurements for AMD Athlon™ 64 and AMD Opteron™ Processors 
Skip Navigation LinksHome
Paul J. Drongowski  12/12/2006 

In order to measure DTLB performance, we collected sample data for Retired Instructions, Data Cache Accesses, L1 DTLB Miss and L2 DTLB Hit, and L1 DTLB and L2 DTLB Miss events. (See Section 6.5.1.)

  Event                Classic function  Improved function
  Abbreviation        multiply_matrices  multiply_matrices
  ------------------  -----------------  -----------------
  Ret_instructions               68,180             88,150 samples
  DC_accesses                   402,415            602,298 samples
  DTLB_L1M_L2H                   59,532                 53 samples
  DTLB_L1M_L2M                  157,529                175 samples

In their paper titled "On Reducing TLB Misses in Matrix Multiplication," Kazushige Goto and Robert van de Geijn assert that translation lookaside buffer misses are the limiting factor in fast matrix multiplication. The event data supports their claim.

Derived measurements were computed from the event data:

                       Classic function  Improved function
  Measurement         multiply_matrices  multiply_matrices
  ------------------  -----------------  -----------------
  Elapsed time                  13.2340             3.4370 seconds
  L1 DTLB req rate               0.5902             0.6833
  L1 DTLB miss rate              0.3184             0.0003
  L1 DTLB miss ratio             0.5394             0.0004
  L2 DTLB req rate               0.3184             0.0003
  L2 DTLB miss rate              0.2310             0.0002
  L2 DTLB miss ratio             0.7257             0.7675

The L1 DTLB request rate is higher for the improved version since it performs more memory access operations than the classic version. For the textbook program, an L1 DTLB miss occurs every 3.1 instructions and an L2 DTLB miss occurs every 4.3 instructions -- clearly unacceptable. The improved matrix multiplication program executes at least 3,300 instructions per DTLB miss.

Back to top
«1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 »
2010 Advanced Micro Devices, Inc. AMD, the AMD Arrow logo, AMD Opteron, AMD Athlon, AMD Turion, AMD Sempron, AMD Phenom, ATI Radeon, Catalyst, AMD LIVE!, and combinations thereof, are trademarks of Advanced Micro Devices, Inc. Microsoft and Windows are registered trademarks of Microsoft Corporation in the United States and/or other jurisdictions. Linux is a registered trademark of Linus Torvalds. Other names are for informational purposes only and may be trademarks of their respective owners.

This website may be linked to other websites which are not in the control of and are not maintained by AMD. AMD is not responsible for the content of those sites. AMD provides these links to you only as a convenience, and the inclusion of any link to such sites does not imply endorsement by AMD of those sites. AMD reserves the right to terminate any link or linking program at any time.
Printer Friendly Version
Table Of Contents