Skip navigation links
Tools
SDKs
Libraries
Samples & Demos
Docs
Zones
Community
Support
Why Virtualization Runs Faster on AMD Opteron™ Processors 
Skip Navigation LinksHome > Docs > Articles & Whitepapers
VMware’s posted benchmark results prove it—Virtualization hosts that run AMD processors consistently turn in better performance than those running other x86 processors. What is the magic sauce and how does it work?
Tim Mueting, Product Manager, Virtualization Solutions  1/27/2009 

As virtualization enjoys wider adoption in data centers, hardware vendors are starting to bring out systems that are purpose-designed as virtualization hosts. Systems such as the R805 and R905 servers from Dell are characterized by several key features that assist virtualization software to work optimally. (While I refer in the article to the Dell systems because I have used them, it’s important to note that HP BladeSystem BL495c and the HP ProLiant DL385G5p systems are also optimized for virtualized workloads.) These include fast processors, a large complement of RAM, and expansive I/O capabilities, especially network bandwidth. This article discusses these traits, and why, on comparably configured systems, those based on AMD Opteron processors have been shown to consistently outperform systems using other x86 processors.

Virtualization hosts are servers that run a hypervisor software foundation that, in turn, runs virtual machines (VMs). Most hosts will run anywhere from a handful to 30 or so VMs. Some hosts run more than 30 VMs, but they are a rarity in IT today. To keep lots of VMs running well, a host needs to have the key features I discussed previously. Let’s look at these in a bit more detail.

  • Lots of RAM. When a VM is started, the full complement of RAM specified in its configuration is generally allocated and kept locked up until the VM is shut down. This might appear a waste of RAM, but in fact it helps improve performance by putting all the VM’s memory in a single allocated block, where possible. This design also guarantees that the VM will have all the RAM it was configured to have. The downside of this so-called “greedy” allocation is that hosts that run more than a few VMs need lots of RAM. 16GB is the very lowest end of the RAM scale for virtualization hosts, but 32GB is a more sensible minimum.
  • Network I/O Capacity. Most hosts do not store the VMs’ data locally. The reason is clear: scalability. If a host were to allow VMs to use local drives, one or two disk-intensive VMs could starve all the other VMs of access to disk. Instead, the data is stored remotely in a SAN or other enterprise storage. To move that data back and forth, big network pipes are needed. These can be GbE cards or Fibre-Channel adapters. To this end, the Dell R905 servers have four embedded GbE adapaters and seven PCIe slots for various adapters including those for Fibre Channel.
  • Efficient Processors: When multiple VMs are running simultaneously, they share the processor on the host. If there are just a few VMs running, this resource competition can be handled by the cores in the processor—by assigning one or two to each VM, depending on need and availability. When there are more VMs than cores available, the hypervisor must shuttle VMs in and out in giving them each brief access to the processor. To do this well, processors must not only be fast, but they have to perform certain functions—especially memory management—very efficiently to deliver the needed performance to VMs.

To handle the processor sharing, networking I/O, and RAM partitioning, many enterprise-scale hypervisors, such as the one in VMware’s VI3, take over many functions normally handled by the operating system. In fact, VI3 is built on its own custom Linux® kernel, which is installed on the bare metal of the host system. (A link to download a version of the VMware ESXi server, which is the hypervisor running in VI3, appears in the Resources section at the end of this article, along with links to other downloadable hypervisors.)

Hypervisors are complex pieces of software that rely heavily on the hardware below them. That is why AMD and other x86 processor vendors have added technology to the processor silicon to facilitate certain difficult operations. Among the key benefits, they helped hypervisors solve certain permissions problems relating to the processor security ring in which their operations occurred. This advance enabled virtualization performance to really take off, and it finally began to attain levels that were close enough to native performance that the difference was no longer a bar to acceptance.

AMD’s Virtualization-Friendly Memory Management

AMD Opteron processors have long been distinguished by having a memory controller directly on the chip. This design permits the processor to resolve its own memory fetches rather than send that request to a chipset component that then fetches the data and sends it back over a memory bus. The benefits of the AMD design increase proportionately when a system has multiple cores, which is a configuration that is de rigueur on virtualization hosts.

On non-AMD x86 designs, memory requests from all the processors are funneled along the memory bus, which can cause significant delays if memory access is substantial. In the AMD design, each chip handles its own requests if the memory addresses are within the area of RAM for which the processor is responsible. If the address is outside the chip’s designated RAM, the request is routed via a high-speed interconnect to the appropriate memory manager. In terms of system architecture, the AMD processor-based design is referred to as NUMA—non-uniform memory architecture. It’s non-uniform in the sense that operations that perform memory fetches within a processor’s dedicated RAM happen much faster than other memory fetches because of the dedicated, on-chip memory manager.

Today’s hypervisors make full use of AMD’s NUMA architecture. To the extent possible, they coordinate placement of a VM within the RAM managed by a single processor, so that all memory accesses made by that VM are handled locally by the memory manager. This step helps reduce the latency of memory fetches when compared with the memory-bus approach.

AMD’s Additional Secret Sauce

There are several other hardware features that enable the hypervisor to run fast on Quad-Core AMD Opteron processors. These are recent additions to AMD Virtualization(AMD-V™) technology, the company’s virtualization performance-enhancement technology. The first of these unique advantages is nested paging, which requires a little explanation.

Memory addresses in x86 systems do not consist of sequential byte numbers that point to a specific byte. Rather, the addresses comprise a series of bits that point to a sequence of tables that identify a specific page in memory and the byte number within that page. If that page is currently in memory, it is accessed; otherwise it is paged in from the device on which it is stored. This scheme enables pages to be swapped to disk and brought back in as needed. (And it means that more memory pages can exist than actual physical memory to hold them—precisely because of the swapping function.) If addresses could refer only to a physical byte number, then memory could not be swapped in and out.

The process of resolving addresses by stepping through entries in page tables (known as page walking) is time-consuming. In virtualization contexts, page walking is particularly complex. The hypervisor has to interrupt the walking at various points and track a separate set of so-called “shadow pages” so that it can accurately and quickly map the VM’s desired memory location to one that the hypervisor can retrieve. In some situations, the overhead of managing address resolution can represent more than 50% of the hypervisor’s overhead.

To reduce this overhead, AMD-V™ introduced Rapid Virtualization Indexing (RVI), which has also been called nested paging tables (NPT), which is a table accessed during a page walk that contains the mapping of the VM-to-physical byte address. By adding the NPT to the page walk, AMD-V removes the need for the hypervisor to use shadow pages to manage the VM-to-physical address resolution. As a result, addresses often resolve faster.

The page-lookup process can be shortcut if the page containing the necessary address has recently been accessed. A small cache, given the eye-glazing name of translation look-aside buffer, or TLB, contains the addresses of recently loaded pages. If the page address is found in the TLB, it need not be resolved and the TLB entry can be used to access the page.

The idea behind the TLB is that it’s likely a program will need to access memory in the same general area when it’s running, so caching addresses of recently examined pages mean that additional memory accesses from the program can be handled quickly without the more costly page-walking process.

With virtualization, however, regular TLBs offer diminished value. As long as a hypervisor is dealing with the needs of just one VM, then the TLB will provide lift. But a hypervisor is frequently managing memory for multiple VMs at once, so as pages for other VMs are accessed, they evict entries for a current VM from the TLB. This process means that the hypervisor finds fewer of the entries it needs in the TLB. To help keep as many useful TLB entries available as possible, AMD-V provides expanded TLBs (48 entries in L1 cache and 128 to 512 entries in L2 cache, depending on the size of the pages). In addition, tagged TLBs can cache the pointers to individual VMs’ paging table entries using VM-specific tags, thereby maximizing the number of useful TLB entries immediately available for a given VM when the hypervisor switches to it. The net effect is that the time spent by the hypervisor resolving VM memory accesses is greatly diminished. (Links to whitepapers on RVI (or NPT) appear in the Resources section at the end of this article.)

Results

AMD chips offer excellent performance, due in part to their elegant solutions to the problems of memory management and address resolution. The chips’ advantage is particularly noticeable in contexts where memory access constitutes a large part of program operations—as in the case of hosting VMs. The effects of AMD's designs can be seen in the VMmark benchmark (see References) for quad-processor systems, where similarly configured systems with AMD processors consistently finish in the lead. This benchmark is a vendor-neutral benchmark designed by VMware that must be run within narrow strictures that are reviewed by VMware before posting. (Design of a virtualization benchmark by SPEC®, the vendor consortium for benchmarks, has been underway for a while. A SPEC benchmark is expected sometime in 2009.) If you look at the posted VMmark results, especially for the basic workhorse of virtualization hosting—the 16-core systems—you will see that, at the time of this writing, AMD processors drive the systems that post the top scores. They also post the top scores in 2-processor systems.

So if your site is embracing virtualization for running desktops or enterprise applications, or for testing applications under development, give your VMs the maximum performance edge by using AMD processors.

Resources

VMmark Results posted at VMware:
http://www.vmware.com/products/vmmark/results.html

The portal at AMD that covers the AMD-V virtualization technology:
http://www.amd.com/us-en/0,,3715_15781,00.html?redir=SWOP08

VMware’s discussion of AMD RVI:
http://www.vmware.com/resources/techresources/1079

Anandtech’s discussion of AMD RVI:
http://www.anandtech.com/weblog/showpost.aspx?i=467

Freely Downloadable Hypervisors

A free copy of VMware’s ESXi hypervisor:
http://www.vmware.com/products/esxi/

Citrix Xen Server is another free hypervisor that runs on x86 systems:
http://citrix.postclickmarketing.com/Producer.aspx?
sid=12&sky=FM56IIKB&pgi=351&pgk=FNJVFW6S&rid=190625&rky=2QYQGSJY&tky=128737674887543750

The open-source version of Xen, on which Citrix Xen is based, is available here:
http://www.xen.org/

KVM is a Linux-based hypervisor solution that is free, open source:
http://kvm.qumranet.com/kvmwiki

Back to top
� 2010 Advanced Micro Devices, Inc. AMD, the AMD Arrow logo, AMD Opteron, AMD Athlon, AMD Turion, AMD Sempron, AMD Phenom, ATI Radeon, Catalyst, AMD LIVE!, and combinations thereof, are trademarks of Advanced Micro Devices, Inc. Microsoft and Windows are registered trademarks of Microsoft Corporation in the United States and/or other jurisdictions. Linux is a registered trademark of Linus Torvalds. Other names are for informational purposes only and may be trademarks of their respective owners.

This website may be linked to other websites which are not in the control of and are not maintained by AMD. AMD is not responsible for the content of those sites. AMD provides these links to you only as a convenience, and the inclusion of any link to such sites does not imply endorsement by AMD of those sites. AMD reserves the right to terminate any link or linking program at any time.