This article explains how applications can use OS APIs and CPUID on a system with AMD processors to discover the number of logical and physical processors, the number of cores, and the association between cores and physical processors. Sample code that can be ported across different operating systems is provided.
The advent of multi-core x86 processors has increased interest in software parallelization.Introductory discussions to parallelization on AMD64 processors often begin by describing AMD’s Direct Connect Architecture. To sum up, the Direct Connect Architecture means that each physical AMD processor is a NUMA node.Each processor has one or more physical CPU cores, and those cores are directly connected through a high-speed memory controller to a physical bank of memory.Latency is lowest when accessing local memory and somewhat higher when accessing remote memory.At the OS level, each core is seen as a “logical processor.”Questions that may arise from this are:
- Which “logical processor” corresponds with which core?
- Which “logical processor” or core is associated with each physical processor?
- How many physical processors are in the system?
These questions also affect licensing considerations, as some software are licensed by the number of CPU cores, and some are licensed by the number of physical processors.
On current and legacy operating systems that run on x86-based processors, there is no common set of APIs across the operating systems that allow applications to discover the topology of a system.In general, there are APIs to discover how many logical processors exist, as well as to affinitize a thread to one or more logical processors.The example source program provided here shows how to take these APIs, combined with the CPUID instruction, and use them to answer the above questions.The program is expected to work on Linux®, Solaris™, and Windows® operating systems.You should be able to use it with gcc, cc, and Visual Studio.For convenience, we also supply a 32-bit console binary of the program compiled for Windows.
A basic understanding of CPUID and BIOS functionality is useful before proceeding.
This example is provided for illustrative purposes only.It is limited by the capabilities of the operating system and permissions of the user running the program.For example, the usage of processor sets on a Unix-based system is likely to change the output by restricting the set of available logical processors. In addition, this applies to current and near-future AMD processors only, and we make no assertions to what it would do on future CPUs from AMD.
Of course, developers have a variety of other options to discover this kind of information.Newer versions of Windows (or older Windows versions with service packs) provide theGetLogicalProcessorInformation API.On Linux, one could write code to parse /proc/cpuinfoor possibly newer additions to the virtual /proc filesystem.And, Solaris provides the psrinfo and prtdiagutilities.The advantage of this example is portability across different operating systems.This is useful if you develop on one platform and deploy on another, or if you work on multiple platforms.
How does this program work? First, a call is made to an OS routine to supply the number of logical processors.Then the running thread pins itself to each logical processor in succession, and while pinned, it invokes CPUID a number of times.An array of structs is populated with the information from CPUID; one struct per logical processor.Then the array is scanned to create a map of physical processors.
While affinitized to each core, we need to use CPUID to find out these crucial fields:
1. LocalApicId – each logical processor has its own APIC ID that uniquely identifies it within the system. This contains an identifier for each physical processor, as well as identifiers for each core within a processor.The different cores are always in the least significant bits in the APIC ID.
Obtain this with CPUID function Fn0000_0001_EBX. Set EAX to 0000_0001 before calling CPUID, and the value returned is in bits 31:24 of the EBX register.
2. ApicIdCoreIdSize – On current “non-legacy” processors, this is the number of least significant bits in the APIC ID that indicates CPU core ID within a processor.On a legacy processor, this value is 0.
Obtain this with extended CPUID function Fn8000_0008_ECX, which provides physical core count information.Set EAX to 8000_0008 before calling CPUID, and the value returned is in bits 15:12 of the ECX register.
In the 8 bits available in LocalApicId, a specific number of least significant bits are allocated to identifying individual cores, and the remaining upper most significant bits identify the physical processor. The task then is to figure out how many bits to use for which piece.If ApicIdCoreIdSize is zero, then we’re on a legacy processor. The physical processor ID is obtained by shifting the upper bits of LocalApicId to the right by the number of bits specified by ApicIdCoreIdSize.
The physical processor ID retrieved in this way may not start at 0, because the physical processor bits in the LocalApicId may be shifted up to account for IOAPIC devices.This is determined by BIOS and is discussed in the CPUID specification.
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Links to third party sites are for convenience only, and no endorsement is implied.