Skip navigation links
Tools
SDKs
Libraries
Samples & Demos
Docs
Zones
Community
Support
Crushing the Numbers: A Mandelbrot Benchmark 
Skip Navigation LinksHome > Docs > Articles & Whitepapers
When it comes to raw performance, the mathematics behind the Mandlebrot set is the perfect proving ground for AMD64. Mike Wall takes us to the code—and beyond.
Mike Wall  8/26/2004 

Two decades ago, fractals burst onto the scene. Fractals are fascinating chaotic geometric shapes that have the characteristic that no matter how much you zoom in on them, they're always uneven. One of the most famous fractals is what's called the Mandelbrot set, named after Benoit Mandelbrot, who discovered fractal geometry in the 1970s. For years, computer scientists, mathematicians and interested hobbyists have used PCs to generate beautiful fractal diagrams, including the Mandelbrot set, as shown in Figure 1.

Figure 1: The famous Mandelbrot set. Source: AMD
Figure 1: Click to enlarge.

The Mandelbrot set itself is developed by a recursive formula: Zn = Zn-12 + C. Here, C is an initial starting point on the complex plane, and Z is a spot inside the Mandelbrot set. It's a deceptively simple formula, but executing it many times and plotting the results provides graphically striking images.

Because the Mandelbrot formula is processor-intensive, it makes an excellent test case for seeing the differences between 32-bit and 64-bit computation using the AMD64 architecture. For our adventure through the Mandelbrot set, I've used the Visual Studio 2005 beta and a program I've written called Mandelblaze. You can download the source code and project files here. You'll need to be running one of the betas of 64-bit Windows XP or Windows Server 2003 to run these tests.

More About Fractals
There are many great sites on the Web that contain information about fractals and the Mandelbrot set. One is the Fractal Geometry site, at , but a quick browse through your favorite search engine should bring up many pages and cool figures.

Dr. Mandelbrot's original work in this area can be found in "The Fractal Geometry of Nature," from W.H. Freeman & Company, published in 1982; in 2004, he also published, "Fractals in Nature," from Springer-Verlag.

Many of the uses of fractal mathematics have been found in financial analysis. A fascinating paper by Dr. Mandelbrot is A Multifractal Walk Down Wall Street, originally published in Scientific American in February 1999.

If you don't have VS 2005, you can use batch files and the AMD64 Platform SDK and command-line tools to run the tests. Or you can just view the source code and the figures in this article.

Note: When installing Visual Studio 2005 beta, you must manually choose to install the x64 compiler and libraries; they are not installed by default. In addition to Visual Studio 2005 (or the PSDK), you also need to install the latest version of the DirectX 9.0 SDK.

Step 1: Compile and benchmark the 32-bit baseline code
Let's start by using Visual Studio 2005 to build the baseline project file, and measure the performance of the original code before optimization. On your 64-bit machine, use Windows Explorer to navigate to the folder that contains the C++ project. Double-click mandelblaze.sln to open the solution in Visual Studio 2005, and open the source file mandelblaze.cpp

Look at the code in the Main Mandelbrot Function which is called BOOL mandel(). There is a baseline main loop, and two alternate versions of the loop. The baseline is called "BASIC DOUBLE PRECISION INNER LOOP", and it should be active (not commented out) in the source. The two alternate loops should be commented out for now.

Read the code for the basic inner loop, especially the 7 lines of code inside the "for" loop. You can see this code is performing double precision floating point calculations. Observe that it repeatedly updates the variables z_real and z_imag with new calculated values, up to 1000 times:


for (int i = 0; i < 1000; i++) {
 double a = z_real * z_real;
 double b = z_imag * z_imag;
 double c = z_real * z_imag;
 c = c + c; // c * 2.0
 z_real = a - b + c_real;
 z_imag = c + c_imag;
 if((a + b) > 4.0) { break; }
}
Figure 2: Running Mandelblaze in 32 bits. Performance is 1.07 gigaflops. Source: AMD
Figure 2: Click to enlarge.

Now, let's compile and benchmark the code for 32-bit mode. Select Release Configuration, and Win32 platform. From the Project menu, click Mandelblaze Properties…, expand C/C++, change Optimization to Maximize Speed (/O2), and click OK. Then, from the Build menu, click Build Solution, and from the Debug menu, click Start Without Debugging. A new window should open, displaying the Mandelbrot Set. Zoom into the main region (solid blue) to generate an accurate gigaflops performance number; the picture should look something like Figure 2.

Observe the text in the window's title bar. It includes benchmark info measured in gigaflops, which will be updated every time the Mandelbrot set is calculated. A gigaflop is 1,000,000,000 double precision floating point operations per second. On my system, the 32-bit result was 1.07 gigaflops.

Step 2: Compile the baseline code for 64-bit performance

Mandelblaze Navigation
You can use the mouse to navigate around the Mandelbrot set —there are some interesting areas over at the edges of the set. The controls are:

Left-click: zoom in
Right-click: zoom out
Move mouse to edge of window: scroll
ESC key: exit

Now, let's compile and benchmark the code for 64-bit mode. Select Release Configuration, and Win64 (AMD64) platform. From the Project menu, click Mandelblaze Properties…, expand C/C++, click Optimization, and change the first field, Optimization, from Custom to Maximize Speed (/O2), and click OK.

Then from the Build menu, click Build Solution, and from the Debug menu, click Start Without Debugging. The Mandelblaze window should open. This is now 64-bit code, compiled with the optimized 64-bit compiler.

Zoom into the blue region to generate an the 64-bit gigaflops number. It should be higher than the baseline, indicating the 64-bit code is faster than the 32-bit code. Figure 3 shows my result, which is 1.22 gigaflops.

Figure 3: Running Mandelblaze in 64 bits. Performance is 1.22 gigaflops. Source: AMD
Figure 3: Click to enlarge.

Step 3: Optimize 32-bit code by adding a second, separate dependency chain
We can make the Mandelblaze code run a lot faster by streamlining the inner loop. We'll do that by adding a second set of calculations which proceeds in parallel with the first set. This will give the CPU more independent floating point instructions to process, to keep the floating point units in the CPU efficiently busy.

First, comment out the baseline loop. You can do that by using "begin comment" characters /* and "end comment" characters */ to comment out the baseline loop (about 20 lines of code).

Next, un-comment the optimized loop. Right below the baseline loop, you should see the optimized loop which is called DOUBLE PRECISION INNER LOOP WITH SEPARATE DEPENDENCY CHAINS. Activate this loop by removing the "begin comment" and "end comment" characters at the start and end of the loop.

As you can see, the optimized code performs the same calculations as the baseline code, but it processes two completely separate Mandelbrot pixels within the loop. The two pixels are calculated with separate variables that have no data inter-dependencies.

To run this code, select Release configuration, and Win32 platform. From the Build menu, click Build Solution. From the Debug menu, click Start Without Debugging. The Mandelblaze window should open. This is the optimized 32-bit build.

Zoom into the blue to get the optimized 32-bit performance score. Figure 4 shows that on my system, the result was 1.56 gigaflops—the highest number so far. That's a good demonstration of the benefit of optimizing an algorithm!

Figure 4: The optimized Mandelblaze program, running in 32 bits. Performance is 1.56 gigaflops. Source: AMD
Figure 4: Click to enlarge.

Close the Mandelblaze window using the ESC key. Look at the C/C++ project settings (Project / Mandelblaze Properties / C/C++ / Code Generation). Notice the Enable Enhanced Instruction Set Option, /arch:SSE2. This allows the 32-bit compiler to emit SSE and SSE2 instructions for best performance. Right below it, the /fp:fast allows the 32-bit compiler (and the 64-bit compiler) to perform certain speed optimizations that may produce slightly different numerical floating point results, but run faster.


Step 4: Compile and benchmark the optimized code for AMD64 (64-bit mode)
To run the 64-bit version of the optimized program, select Release Configuration, and Win64 (AMD64) platform. From the Build menu, click Build Solution. From the Debug menu, click Start Without Debugging. The Mandelblaze window should open. This is the optimized 64-bit build.

Zoom into the blue to see the optimized 64-bit performance. This should be much higher than the optimized 32-bit performance. My test, seen in Figure 5, shows 2.06 gigaflops.

Figure 5: The optimized Mandelblaze program, running in 64 bits. Performance is 2.06 gigaflops. Source: AMD
Figure 5: Click to enlarge.

You can see why the 64-bit code is so much faster by examining the assembly code output. Compare 64-bit main loop in amd64/release/mandelblaze.cod vs. 32-bit main loop in release/mandelblaze.cod. The 64-bit code can keep all the variables in SSE registers, for fast access. The 32-bit code can only use eight SSE registers; it must use memory variables for the rest, which reside in cache but are still slower than using registers.


The benefits are in the benchmarks
We've just seen how a single source file for both 32-bit and 64-bit targets can run unchanged under 64-bit Windows—and the 64-bit code is definitely faster. Using source-level optimizations to help the performance of the floating point code also offers a tremendous payoff.

Using 64-bit Integers to Make an Extended-Precision Inner Loop
Read on...

Indeed, the technique of coding two separate dependency chains in the main loop, which offers more independent variables and operations that can be performed in parallel or out-of-order by the pipelined Floating Point Units in the CPU, works especially well in 64-bit mode, because the 16 SSE registers available in 64-bit mode permit all the variables to be stored in registers. As a result, the 64-bit code performs significantly better than the 32-bit code.

Mike Wall is a member of the technical staff at Advanced Micro Devices Inc.

 

Using 64-bit Integers to Make an Extended-Precision Inner Loop

We can increase the accuracy of the calculations by using 64-bit integers. The integer code will run fast, while enabling higher numerical precision than the floating point variables. This "Big Number Math" with 64-bit integer multiplication can be executed efficiently because AMD64 supports true 64x64=128 bit multiplication in hardware.

To set that up, comment out the optimized floating point loop by using the "begin comment" characters /* and "end comment" characters */ to comment out the optimized floating point loop (about 60 lines of code). Then, un-comment the 64-bit integer loop. Right below the optimized floating point loop, you should see the 64-BIT INTEGER LOOP FOR EXTENDED PRECISION. Activate this loop by removing the "begin comment" and "end comment" characters at the start and end of the loop.

Look at this 64-bit integer code. Again, it performs the same basic calculations as the other loops did, but it uses the INT64 data type instead of floating point. What is the advantage? Double precision floating point numbers are 64 bits in size, but they have only 53 significant bits; the rest are used for the exponent and the sign. The Mandelbrot set only requires numbers in the approximate range +/- 4, so we can use 64-bit integers instead of floating point, and get about 8 more significant bits.

Observe the use of the compiler intrinsic function __mulh which performs a full 64x64=128 bit integer multiply, and returns the most significant 64 bits of the result. This operation is compiled and executed efficiently as native 64-bit integer instructions in the AMD64 instruction set.

(Notice how the "binary point" is manually adjusted using fast shift operators "<<" and ">>". Also, ordinary addition operator "+" when applied to an INT64 data type, efficiently does a 64-bit addition using the native 64-bit add instructions. Look at the .cod file and see the actual integer 64-bit imul, add, sub, and other instructions.)

Figure 6: The optimized Mandelblaze program, rewritten to use 64-bit integers. Performance is 1.50 gigaflops. Source: AMD
Figure 6: The optimized Mandelblaze program, rewritten to use 64-bit integers. Performance is 1.50 gigaflops. Source: AMD

To compile and benchmark the INT64 code for AMD64 (64-bit mode), select Release Configuration, and Win64 (AMD64) platform. From the Build menu, click Build Solution. From the Debug menu, click Start Without Debugging. The Mandelblaze window should open. This is the 64-bit integer "Big Number Math" build.

As you can see from Figure 6, the performance of this version is somewhat slower than the floating point version, but the numerical precision is much greater, theoretically allowing the code to zoom in much deeper than the floating point code.

Back to top
2010 Advanced Micro Devices, Inc. AMD, the AMD Arrow logo, AMD Opteron, AMD Athlon, AMD Turion, AMD Sempron, AMD Phenom, ATI Radeon, Catalyst, AMD LIVE!, and combinations thereof, are trademarks of Advanced Micro Devices, Inc. Microsoft and Windows are registered trademarks of Microsoft Corporation in the United States and/or other jurisdictions. Linux is a registered trademark of Linus Torvalds. Other names are for informational purposes only and may be trademarks of their respective owners.

This website may be linked to other websites which are not in the control of and are not maintained by AMD. AMD is not responsible for the content of those sites. AMD provides these links to you only as a convenience, and the inclusion of any link to such sites does not imply endorsement by AMD of those sites. AMD reserves the right to terminate any link or linking program at any time.