The new AMD “Istanbul” processors build on the foundation laid by the AMD “Barcelona” and “Shanghai” Family 10h processors with some key technology advancements. With “Barcelona,” we introduced an array of innovations in processor design and features, including native quad-core architecture and a new L3 cache shared across the processor cores. The AMD “Shanghai” release brought additional enhancements including improved scalability, availability and increased the L3 cache. Now with the release of the AMD “Istanbul” processor there are even more enhancements for software developers such as an even larger shared L3 cache, a total of six physical cores on die, a new probing filter called HT Assist to help increase bandwidth and several new power features to keep the system running cool.
There are a number of software visible features that can be leveraged to make your applications perform better and be ready to scale across multiple cores. Visit this page regularly for updated information and practical guidance on how to take advantage of all the new features in the “Barcelona”, “Shanghai”, and “Istanbul” Family 10h processors.
» Software Development Tools and Resources» Overview of Software Visible Features» Documentation» Technical Articles & Blogs» Benchmarks and Performance Evaluations» Related Resources
AMD Core Math Library (ACML)ACML is specifically designed to support multi-threading and other key features of AMD’s next-generation processors. ACML currently supports OpenMP, and features hand-tuned “Barcelona”, “Shanghai” and “Istanbul” support support for SGEMM and DGEMM matrix multiplication routines, and the CFFT complex-complex Fast Fourier Transforms. The newly released ACML 4.2.0 includes further tuning of DGEMM and improved performance on 3D FFTs. The newly released ACML 4.2.0 includes further tuning of DGEMM and improved performance on 3D FFTs.
AMD CodeAnalyst Performance Analyzer“Shanghai” built upon the Instruction-Based Sampling (IBS) functionality that was originally introduced in “Barcelona.” “Shanghai” added a new mode of operation for Instruction-Based Sampling. This mode enhances IBS op sampling. In addition to using processor cycles to select ops for monitoring and sampling, the new mode counts ops as they are dispatched and uses the count to decide when an op should be selected for monitoring and sampling. The new mode greatly improves the statistical distribution of profile data and will help software developers to interpret and apply IBS data and is supported in the “Istanbul” processor.
AMD CodeAnalyst Performance Analyzer also supports the small number of new performance events and event unit masks on “Shanghai” and “Istanbul” processors.
FramewaveThe Framewave open source library is optimized to yield maximum performance on x86 and AMD64 hardware architectures. Current implementations exploit multicore architecture and single instruction multiple data (SIMD) instructions. Specifically, streaming SIMD extensions and AMD Family 10h technologies are used to optimize for speed. Please download the latest Framewave version from SourceForge to experience the best performance.
GNU ToolsetThe GNU Toolset, including the GCC compiler, the glibc project, and the binutils, have been optimized for AMD Family 10h processors, including “Shanghai” and “Barcelona.”
Microsoft Visual Studio® compilersThe Visual Studio 2008 tools feature improved instruction selection, optimized register allocation, and enhanced 128-bit floating-point performance when used with AMD Third-Generation Opteron processors.
x86 Open 64 Compiler SuiteThe x86 Open64 compiler system is a high performance, production quality code generation tool designed for high performance parallel computing workloads. The x86 Open64 environment provides the developer the essential choices when building and optimizing C, C++, and Fortran applications targeting 32-bit and 64-bit Linux platforms.
PGI compilersPGI compilers and tools enable maximum overall performance on multi-core AMD64 processors through auto-parallelization and OpenMP directive-based parallel programming. New options and optimizations improve Peak SPECCPU 2006 performance between 5-6% over the previous release 7.1 running on quad-core AMD Opteron processors.
Sun Studio compilersThe latest version of the Sun Studio compilers contain performance improvements to better support AMD’s “Barcelona” and “Shanghai” processors, including compiler optimization flags for best performing code.
» See all
Feature identification bits for new instructions
New features in AMD’s upcoming Barcelona chip dramatically boost performance of floating-point arithmetic and greatly accelerate access to cache. » SSE128: AMD’s New Floating-Point Enhancements
Take advantage of the many architectural innovations in the "Barcelona" processor through Orcas-based tools and AMD libraries.» Develop Blazing Fast Code with Microsoft Visual Studio® 2008 (code-named “Orcas”) and AMD Tools
AMD’s new chip architecture extends a long tradition of giving developers the features they need to execute their code blindingly fast. What's in it for you?» Going to Barcelona: A Modern Architecture for Breakthrough Software Performance
AMD “Shanghai” (Family 10h) Processor Software Visible Features blog series
New “Istanbul” blogs» “Shanghai” zone is now “Istanbul” zone» “Istanbul” overview
Previous “Shanghai” blogs» Transition from “Barcelona” to “Shanghai”» Larger L3 Cache» Improved Reliability, Availability, Scalability
Previous “Barcelona” blogs» Welcome» Shared L3 Cache» CPUID » Instruction-Based Sampling (IBS)» MONITOR/MWAIT» SSE Misaligned Access» SSE4a Instruction Set, Part 1» SSE4a Instruction Set, Part 2» Sideband Stack Optimizer » 128-bit FPU» Advanced Bit Manipulation (ABM)
Shanghai-based Dell Systems take top scores for VMmark 8 core and 16 core systems.» http://www.vmware.com/products/vmmark/results.html
This VMware performance white paper evaluating RVI performance with the Shanghai processor concludes that "the current VMware VMM leverages these features quite well, resulting in performance gains of up to 42% for MMU-intensive benchmarks and up to 500% for MMU-intensive microbenchmarks."» http://www.vmware.com/resources/techresources/1079
HP ProLiant DL585 G5 earns #1 virtualization performance record on VMmark benchmark.» ftp://ftp.compaq.com/pub/products/servers/benchmarks/proliant_dl585_vmmark_080408.pdfThe very first independent Nested Paging Virtualization tests (2 socket servers running Xen with database and web serving workloads and featuring AMD-V (RVI)).» http://www.anandtech.com/weblog/showpost.aspx?i=467
“Jaguar,” the AMD Opteron-based system by Cray at Oak Ridge National Labs, is the first entirely x86-based system to break the Petaflop barrier. » http://www.marketwatch.com/news/story/Cray-Supercomputer-Oak-Ridge-Smashes/story.aspx?guid=%7B25D20E9B-D6BD-4CA5-B7F6-3484D9616D7C%7D
HP ProLiant DL585 G5 and DL385 G5 AMD Opteron servers lead with 4P, 2P world record performances on the SPECweb®2005 Benchmark.» ftp://ftp.compaq.com/pub/products/servers/benchmarks/hp_proliant_dl585_385_specweb2006_073008.pdf(Please note that Dual-Core AMD Opteron processors also hold the SPECWeb2005 performance records for 2P and 4P servers.)
An 8 socket Shanghai-based HP system achieves the top x86-based score with Oracle and a 2 socket Shanghai-based HP system achieves the top x86-based score with SQL Server 2005.» http://www.sap.com/solutions/benchmark/sd2tier.epx
AnandTech is "quite surprised that Shanghai was able to meet and, in some cases, pass Harpertown at various workload levels in some of the benchmarks." » http://www.anandtech.com/showdoc.aspx?i=3456&p=7
HP ProLiant DL585 G5 with Quad-Core AMD Opteron processors takes #1 4-socket worldwide price/performance record again on TPC-C benchmark.» ftp://ftp.compaq.com/pub/products/servers/benchmarks/hp_proliant%20dl585_tpc_080208.pdfHP ProLiant DL785 G5 achieves #1 8P non-clustered performance and price/performance on TPC-H@300GB benchmark.» ftp://ftp.compaq.com/pub/products/servers/benchmarks/dl785g5-tpch300gb-0708.pdf
HP ProLiant BL465c G5 server blade posts HP’s first Quad-Core AMD Opteron™ blade result on Oracle Applications Standard Benchmark (small model, single DB instance).» ftp://ftp.compaq.com/pub/products/servers/benchmarks/hp_proliant_bl460c%20_siebel_perf_brief_051408.pdfHP ProLiant DL585 G5 achieves #1 4-processor Windows result on two-tier SAP® Sales and Distribution Standard Application Benchmark.» ftp://ftp.compaq.com/pub/products/servers/benchmarks/dl585g5_2tsapsd_071408.pdfHP ProLiant DL785 G5 takes #1 8-processor Windows result with new Quad-Core AMD Opteron™ processors on two-tier SAP® Sales and Distribution Standard Application Benchmark.» ftp://ftp.compaq.com/pub/products/servers/benchmarks/dl785g5_2tsapsd_may08.pdfHP ProLiant servers show excellent performance scalability with new Quad-Core AMD Opteron processors on two-tier SAP® Sales and Distribution (SD) Standard Application Benchmark (2 socket and 4 socket blades and servers).» ftp://ftp.compaq.com/pub/products/servers/benchmarks/HP_ProLiant_DL385_BL685c_2tSAPSD_March2708.pdf
Quad-Core AMD Opteron processor-based Sun X4600 server sets x86 SPECjbb2005 world record (8 socket server).» http://www.sun.com/aboutsun/pr/2008-08/sunflash.20080807.1.xml
HP ProLiant DL585 G5 server with latest Quad-Core AMD Opteron™ processors takes overall x86_64 records on SPEC® CPU2006 benchmark.» ftp://ftp.compaq.com/pub/products/servers/benchmarks/dl585_g5_speccpu2006_july08.pdf
» “Shanghai” zone is now “Istanbul” zone» “Istanbul” overview