Skip navigation links
Tools
SDKs
Libraries
Samples & Demos
Docs
Zones
Community
Support
Optimizing Your C/C++ Applications, Part 1 
Skip Navigation LinksHome > Docs > Articles & Whitepapers
C/C++ optimizing compilers are great--but there *are* a few techniques for hand-tuning your code to run as efficiently as possible on the AMD Athlon64 and Opteron processors, especially when developing DLLs, device drivers, or other performance-bound pieces of code.
Alan Zeichick   6/21/2004 
Sidebar: Know the Code
A code profiler can help determine which parts of your application should be optimized. You should always use a profiler both before and after optimization to see if your change actually made an improvement—with C/C++, you can never be sure if your hand-tuning methods are better than those of your optimizing compiler!

The smart engineers at AMD have spent a lot of time identifying places where C/C++ developers can optimize code for best performance. Some of these optimizations might look familiar—they're generally great coding practice for any 32-bit or 64-bit platform or operating system. In fact, most of these suggestions are appropriate whether you're developing 32-bit or 64-bit apps. Others are specifically for 64-bit applications running on the AMD64 architecture. This article will look at six optimizations; Part 2 will examine eight more.

#1. Declare as static functions that are not used outside the file where they are defined

Declaring a function as static forces an internal linkage within that file, which can improve the performance of the code. Functions that are not declared as static default to external linkage, which may inhibit certain optimizations, such as aggressive inlining, with some C/C++ compilers.

#2: Use array notation instead of pointer notation when working with arrays

You can use either array operators ([]) or pointers to access the elements of an array. However, it's hard for the compiler to optimize pointers because the pointers are, by default, less constrained—the compiler has to assume that the pointer can read/write to any location in memory. Because array notation is less ambiguous, the compiler may be able to optimize it more effectively.

Because it's hard to know which way is more efficient, you may want to try coding your array reference using both pointers and array notation, and use a code profiler to see which way is fastest for your particular application.

#3. Completely unroll loops that have a small fixed loop count and a small loop body

As you know, many compilers do not aggressively unroll loops. Manually unrolling loops can benefit performance, because you're avoiding the loop overhead; for a small loop, that overhead can be significant.

For example, avoid a small loop like this:

// 3D-transform: Multiply vector V by 4x4 transform matrix M.
for (i = 0; i < 4; i++) {
   r[i] = 0;
   for (j = 0; j < 4; j++) {
      r[i] += m[j][i] * v[j];
   }
}

Instead, replace it with its completely unrolled equivalent, as shown here:

r[0] = m[0][0] * v[0] + m[1][0] * v[1] + m[2][0] * v[2] + m[3][0] * v[3];
r[1] = m[0][1] * v[0] + m[1][1] * v[1] + m[2][1] * v[2] + m[3][1] * v[3];
r[2] = m[0][2] * v[0] + m[1][2] * v[1] + m[2][2] * v[2] + m[3][2] * v[3];
r[3] = m[0][3] * v[0] + m[1][3] * v[1] + m[2][3] * v[2] + m[3][3] * v[3];
#4. Consider expression 

#4. Consider expression order when coding a compound branch

Complex branches—those which consist of many Boolean expressions with logical AND (&&) and logical OR (||) operators—provide great opportunities for optimization, if you know which AND and OR conditions are more likely to occur.

That's because the C/C++ compiler will optimize the code to terminate execution of that complex branch as soon as it's sure if the operand evaluates to true or false, depending on the logic. For example, the branch conditions are satisfied as soon as it detects a true operand in an OR expression; similarly, the application knows that the branch will fail as soon as it sees a false operand in an AND expression.

You might experiment with changing the order of the operands if you know that a true/false outcome is more probable for a specific operand. For example, in an AND expression, place the operand most likely to be false up front. Of course, you'll need to make sure that changing these order of the operands won't cause any unwanted side effects.

#5. Leverage the early termination abilities of complex If statements

As seen above, as soon as a true operand is found in an OR expression, or a false one is found in an AND expression, execution of that particular expression is terminated because outcome is known. You can leverage that, if you have a good idea as to whether an entire expression is likely to be true or false, to try to short-cut execution of the condition.

For example, say you're testing to see if a and b are going to be within a specified numeric range. If you know that those numbers will generally be within that range, you can use the following because the most likely outcome will evaluate to false and thereby terminate the AND condition:

  if (a <= max && a >= min && b <= max && b >= min)

If you know that the data will generally not be in a specified range, you can logically reverse the If statement to optimize execution in the same way, with a true terminating the OR condition:

  if (a > max || a < min || b > max || b < min)

#6. Use if-else statements in place of switch statements that have noncontiguous case expressions

If the case expressions are contiguous or nearly contiguous integer values, most compilers translate the switch statement as a jump table instead of a comparison chain. Jump tables generally improve performance because they reduce the number of branches to a single procedure call, and shrink the size of the control-flow code no matter how many cases there are. The amount of control-flow code that the processor must execute is also the same for all values of the switch expression.

However, if the case expressions are noncontiguous values, most compilers translate the switch statement as a comparison chain. Comparison chains are undesirable because they use dense sequences of conditional branches, which interfere with the processor's ability to successfully perform branch prediction. Also, the amount of control-flow code increases with the number of cases, and the amount of control-flow code that the processor must execute varies with the value of the switch expression.

For example, if the case expression are contiguous integers, a switch statement can provide good performance:

switch (grade)
{
   case 'A':
      ...
      break;
   case 'B':
      ...
      break;
   case 'C':
      ...
      break;
   case 'D':
      ...
      break;
   case 'F':
      ...
      break;

But if the case expression aren't contiguous, the compiler may likely translate the code into a comparison chain instead of a jump table, and that can be slow:

switch (a)
{
   case 8:
      // Sequence for a==8
      break;
   case 16:
      // Sequence for a==16
      break;
   ...
   default:
      // Default sequence
      break;
}

In cases like this, replace the switch with a series of if-else statements:

if (a==8) {
   // Sequence for a==8
}
else if (a==16) {
   // Sequence for a==16
}
...
else {
   // Default sequence
}

Tune in next time
In the second part of this series, we'll look at eight more optimizations, focused on data structure alignment, integer operations and floating-point math.

A former mainframe software developer and systems analyst, Alan Zeichick is principal analyst at Camden Associates, an independent technology research firm focusing on networking, storage, and software development.

Back to top
2010 Advanced Micro Devices, Inc. AMD, the AMD Arrow logo, AMD Opteron, AMD Athlon, AMD Turion, AMD Sempron, AMD Phenom, ATI Radeon, Catalyst, AMD LIVE!, and combinations thereof, are trademarks of Advanced Micro Devices, Inc. Microsoft and Windows are registered trademarks of Microsoft Corporation in the United States and/or other jurisdictions. Linux is a registered trademark of Linus Torvalds. Other names are for informational purposes only and may be trademarks of their respective owners.

This website may be linked to other websites which are not in the control of and are not maintained by AMD. AMD is not responsible for the content of those sites. AMD provides these links to you only as a convenience, and the inclusion of any link to such sites does not imply endorsement by AMD of those sites. AMD reserves the right to terminate any link or linking program at any time.