The x86 instruction set is an industry standard, and is found in chips like Intel's Pentium, Celeron and Xeon processors, as well as in AMD's Opteron, Athlon, Sempron, and Turion processors, and even in chips from other manufacturers, like VIA Technology's C3 processor family.
While the hardware specifications and interfaces of those x86-compatible chips vary, along with power consumption and performance characteristics, they all conform to the core x86 instruction-set standards. Many also offer extensions to that x86 architecture, and many of those extensions have been implemented by several vendors. This means binary portability. Compilers written for one x86 processor should create object code and binaries that run on any contemporary x86 processors with those same extensions. The use of standards provides economy of scale, gives customers choices, and fosters innovation on implementation of those standards.
There are many extensions to the x86 architecture, such as the original MMX multimedia extensions, which were then enhanced by SSE (Streaming SIMD Extensions), then SSE2 and, more recently, SSE3. Those are all found on processors made by Intel and by other companies, including AMD.
Another popular extension adds 64-bit capabilities to the 32-bit x86 processor. These extensions, which contain both new instructions and new registers, were initially developed by AMD as part of its AMD64 architecture, but were also implemented in Intel's newest Xeon processors with Extended Memory 64-bit Technology (Intel EM64T).
It's good that AMD and Intel, the two microprocessor giants, as well as smaller companies like VIA, use the same core instruction-set standard. This allows the companies to innovate and compete on implementation, while we developers and end users can be assured that compiled code will run on any x86-based system with the right instruction-set extensions.
As you probably know, there are many excellent native compilers available for x86 processors, and most of these compilers can handle most (or all) extension-set extensions targeting Linux, Unix, and Windows. Developers can choose a wide range of compilers to use, including the open-source GNU compiler collection, and outstanding offerings from PathScale, Portland Group, Intel, Sun and Microsoft.
Unfortunately, there are sometimes a few glitches. Intel's compilers, for example, don't always compile to run on non-Intel chips. But for the most part, all of those compilers target the x86 instructions, perform extension-set optimizations, and generate binaries that generally work fine on all x86 chips.
Of course, when you write an application, there's no way for your toolchain to know what type of runtime system you're targeting. So, by default, the compiler targets the widest possible installed base, by generating binaries for an x86 processor with few or no extensions. If you want to optimize your code to take advantage of specific processor hardware extensions—Ssuch as the SSE2 or SSE3 floating-point instructions or registers, or the 64-bit x86 extensions that AMD calls AMD64 and Intel calls Intel EM64T, you use compiler switches.
For example, the Intel 8.1 C/C++ compiler uses the flag -xN (for Linux) or -QxN (for Windows) to take advantage of the SSE2 extensions. For SSE3, the compiler switch is -xP (for Linux) and -QxP (for Windows). This instructs the compiler to use the new extensions and registers. However, code that uses those registers and instructions would crash and burn if run on a processor that doesn't have the extra bits of hardware needed to support those extensions.
So the optimizing compiler also adds a bit of runtime code that actually checks to see if the processor has the right extensions, and then gently fails the app if it runs on a non-supported processor. In the case of the Intel compiler, it fails with the message, "Fatal Error: This program is not built to run on the processor in your system."
So far, so good. But how does that little bit of runtime code see if the processor is compatible? It doesn't actually try executing the extensions. Instead, it uses the CPUID instruction to ask the processor to explain its specific capabilities.
Check the Badge
The CPUID (central processing unit identification) instruction lets operating systems and applications determine if they are compatible with the hardware platform, as well as choose the proper execution paths or runtime libraries to leverage specific processor features.
Depending on the input parameter specified in the EAX register, CPUID can return two different sets of information. If the EAX register contains a 0, CPUID stores the processor vendor string in the EBX, ECX, and EDX registers. If EAX contains 1, then CPUID puts a list of feature capability flags into those three registers.
The processor vendor string is easy to analyze, but isn't hugely valuable to most developers. If CPUID is run on an Intel processor, the string returned is "GenuineIntel." If it's an AMD processor, it's "AuthenticAMD." (I don't have a system with a VIA processor, so I can't check the string.) Normally, the processor vendor string isn't important except for trivia. Windows, for example, will tell you the manufacturer of the processor from the My Computer -> Properties dialog.
Far more important to an operating system and its applications—and in particular, apps optimized for specific processor extensions—are the feature flags returned by calling CPUID with EAX=1. This loads the EBX, ECX, and EDX registers with bits indicating the specific capabilities of the microprocessor, including which x86 extensions it supports. Intel and AMD use these bits in a consistent way.
For example, bit 23 in the EDX register indicates whether the processor supports the original 64-bit MMX instructions. Bit 25 indicates SSE and bit 26 is for SSE2. Bit 0 in the ECX register indicates support for the SSE3 instructions.
Many current Intel and AMD processor models (and future VIA processors expected to be released this year) support MMX, SSE, and SSE2, and return the proper bits from CPUID. The latest Intel Xeon processors, and revision E (or later) of the AMD Opteron processor, set the SSE3 bit to 1; for earlier processors, it's set to 0.
(Detailed information about CPUID is in the AMD64 Architecture Programmer's Manual, Volume 3. In the February 2005 edition of this manual, the CPUID instruction is covered beginning on page 117. These flags are documented in tables 3-4 and 3-5, beginning on page 123.
The expected behavior for an application compiled to use the SSE2 or SSE3 extensions is that the compiler would insert a library that can analyze processor capabilities. On application launch, that library would call CPUID with EAX containing a 1, and then check the appropriate bits to make sure the required extensions are supported. Only then would it allow execution to continue. That's the behavior for most compilers when using those types of optimizations.
You should know, by the way, that this isn't what the Intel compiler's little bit of runtime code does. With the -xN/-QxN and -xP/-QxP flags set, it checks the processor vendor string—and if it's not "GenuineIntel," it stops execution without even checking the feature flags. Ouch!
Fortunately, there are workarounds. There are other compiler switches that you can use to tell the Intel compiler to use hardware extensions without checking the processor vendor string. However, it's unclear if those other switches perform the same level of optimization.
But that's what you have to do if you want to optimize code for processor extensions using the Intel compiler, and you want your code to run on non-Intel hardware. Those switches are, for example, -axN/-QaxN for SSE2 and -axP/-QaxP for SSE3.
The bad news is that the -axN/-QaxN and axP/-QaxP tell the compiler to generate multiple code paths. The optimized SSE2 or SSE3 code paths are still vendor-checked; the difference is that there will be a generic (less optimized) fallback. The application will run—you won't get the dreaded "Fatal Error." But the best possible code path won't run on a non-Intel processor. Your app runs... but it's not fully optimized. Still, it's better than not running at all.
What Can You Do?
Currently, developers cannot create optimized applications using the Intel compilers that will run on non-Intel processors. The only way to achieve runtime portability is to disable optimizations that might cause the binaries to fail on non-Intel processors. This reduces the value of an expensive optimizing compiler.
How can this be addressed? Developers could request that Intel change the behavior of its compilers, and use the industry-standard feature flags, not the proprietary processor vendor string, to determine when a processor is compatible with specific compiler optimizations and instruction set instructions.
In the meantime, the best bet is to switch to a different family of compilers. As mentioned above, there are plenty of options available, from PathScale to Portland Group, from Microsoft to Sun, as well as open-source compilers from GNU. If you're trying to develop for an industry-standard architecture, you can't use tools that won't respect those standards.
Steve Westfield is a C/C++ programmer, consultant and video game addict who lives outside of Chicago.