ServerZoneBanner_2_1140x100

The formerly codenamed AMD “Magny-Cours” processor (part of the Family 10h processor family) introduces some key technology advancements that build on the foundation laid by preceding processors, formerly codenamed AMD “Barcelona” ,“Shanghai” and “Istanbul”. With “Barcelona,” we introduced an array of innovations in processor design and features, including native quad-core architectureand a new L3 cache shared across the processor cores. The AMD “Shanghai” release brought additional enhancements including improved scalability,availability and increased the L3 cache. The AMD “Istanbul” processor provided even more enhancements for software developers such as an even larger shared L3 cache, a total of six physical cores on die, a new probing filter called HT Assist to help increase bandwidth , several new power features as well as I/O virtualization. “Magny-Cours” adds even more cores, for a total of up to 12-cores per processor, as well as enhancing features such as power, virtualization anddirect connect architecture.There are a number of software visible features that can be leveraged to make your applications perform better and be ready to scale across multiple cores. Visit this page regularly for updated information and practical guidance on how to take advantage of all the new features in the latest Family 10h processors.

Software Development Tools and Resources

The following software development tools and resources have been optimized for Family 10h processors:

AMD Core Math Library (ACML)

ACML is specifically designed to support multi-threading and other key features of AMD’s next-generation processors. ACML currently supports OpenMP, and features hand-tuned “Barcelona”, “Shanghai”, “Istanbul” and “Magny Cours” support for BLAS matrix multiplication routines, and the CFFT complex-complex Fast Fourier Transforms. The newly released ACML 4.4.0 includes further tuning of ZGEMM and real-complex FFTs.

GNU Toolset

The GNU Toolset, including the GCC compiler, the glibc project, and the binutils, have been optimized for AMD Family 10h processors.

Microsoft Visual Studio® compilers

The Visual Studio 2008 tools feature improved instruction selection, optimized register allocation, and enhanced 128-bit floating-point performance when used with AMD Family 10h processors.

x86 Open 64 Compiler Suite

The x86 Open64 compiler system is a high performance, production quality code generation tool designed for high performance parallel computing workloads. The x86 Open64 environment provides the developer the essential choices when building and optimizing C, C++, and Fortran applications targeting 32-bit and 64-bit Linux platforms. See all Optimized Partner Tools

Overview of Software Visible Features

Previous new feature flags for Family 10h functions :

  • Fire & forget dynamic O/S P-state support
  • Misaligned SSE access
  • OS Visible workaround register
  • Instruction-based sampling
  • SVM lock
  • Nested Paging
  • L3 cache size
  • 128-bit FPU

Feature identification bits for new instructions

  • MONITOR/MWAIT
  • LZCNT
  • POPCNT
  • SSE4a Instructions

Documentation

Technical Articles & Blogs

There are several new features in power and virtualization, but the most prominent new feature is the increase in cores to 8 and 12 on each processor made possible by our Direct Connect Architecture. This technical article outlines what enhancements were made and how they will benefit your code.

Five years ago, AMD shook up the x86 processor by putting a memory controller directly on-chip. Now, AMD breaks new ground again with an innovative cache strategy.

New features in AMD’s upcoming Barcelona chip dramatically boost performance of floating-point arithmetic and greatly accelerate access to cache.

Take advantage of the many architectural innovations in the “Barcelona” processor through Orcas-based tools and AMD libraries.

AMD (Family 10h) Processor Software Visible Features blog series

“Magny-Cours” blogs

Previous “Istanbul” blogs

Previous “Shanghai” blogs

Previous “Barcelona” blogs

 

Benchmarks and Performance Evaluations

Virtualization

This VMware performance white paper evaluating RVI performance with the Shanghai processor concludes that “the current VMware VMM leverages these features quite well, resulting in performance gains of up to 42% for MMU-intensive benchmarks and up to 500% for MMU-intensive microbenchmarks.”

HP ProLiant DL585 G5 earns #1 virtualization performance record on VMmark benchmark.

The very first independent Nested Paging Virtualization tests (2 socket servers running Xen with database and web serving workloads and featuring AMD-V (RVI)).

HPC

“Jaguar,” the AMD Opteron-based system by Cray at Oak Ridge National Labs, is the first entirely x86-based system to break the Petaflop barrier.

Web Serving

HP ProLiant DL585 G5 and DL385 G5 AMD Opteron servers lead with 4P, 2P world record performances on the SPECweb®2005 Benchmark.

Database

An 8 socket Shanghai-based HP system achieves the top x86-based score with Oracle and a 2 socket Shanghai-based HP system achieves the top x86-based score with SQL Server 2005.

AnandTech is “quite surprised that Shanghai was able to meet and, in some cases, pass Harpertown at various workload levels in some of the benchmarks.”

HP ProLiant DL585 G5 with Quad-Core AMD Opteron processors takes #1 4-socket worldwide price/performance record again on TPC-C benchmark.

Business Applications

HP ProLiant BL465c G5 server blade posts HP’s first Quad-Core AMD Opteron™ blade result on Oracle Applications Standard Benchmark (small model, single DB instance).

HP ProLiant DL585 G5 achieves #1 4-processor Windows result on two-tier SAP® Sales and Distribution Standard Application Benchmark.

HP ProLiant DL785 G5 takes #1 8-processor Windows result with new Quad-Core AMD Opteron™ processors on two-tier SAP® Sales and Distribution Standard Application Benchmark.

HP ProLiant servers show excellent performance scalability with new Quad-Core AMD Opteron processors on two-tier SAP® Sales and Distribution (SD) Standard Application Benchmark (2 socket and 4 socket blades and servers).

Java Application Serving

Quad-Core AMD Opteron processor-based Sun X4600 server sets x86 SPECjbb2005 world record (8 socket server).

Floating Point Performance

HP ProLiant DL585 G5 server with latest Quad-Core AMD Opteron™ processors takes overall x86_64 records on SPEC® CPU2006 benchmark.

Related Resources