Powered by
Quad-Core AMD Opteron Processors
|
|
128-Bit SSE5 Instruction Set
AMD Developer Central > CPU Tools > 128-Bit SSE5 Instruction Set
SIMD (single instruction, multiple data) instructions, also called packed instructions, are widely used in high performance computing (HPC), multimedia, and security applications. These instructions operate on a set of packed data values simultaneously. The popular SIMD instruction set extensions in the x86 architecture are called SSE (Streaming SIMD Extension) and consist of SSE1 (or simply SSE) to SSE5. Many of these instructions operate on multiple data elements (e. g. a vector) packed into a 128-bit wide register.
Streaming SIMD Extension 5 (SSE5) is a new extension to the AMD64 (x86=64) instruction set. SSE5 adds 170 new instructions and will be available starting with the Bulldozer processor core due to be released in 2009. These instructions will have greater benefits in domains like HPC, multimedia, and security applications than previously released SSE instruction sets.
SSE5 instructions typically operate on 128-bits of data at a time, as do previously released SSE instruction sets. These new instructions aim to increase work per instruction and remove additional overhead for storing and reloading of register operands through the introduction of an additional operand.
The new instructions include
- Fused multiply accumulate (FMACxx) instructions
- Integer multiply accumulate (PMAC, PMADC) instructions
- Permutation and conditional move instructions
- Vector compare and test instructions
- Precision control, rounding, and conversion instructions
» AMD64 Technology 128-Bit SSE5 Instruction Set
Please send feedback to: SSE5.feedback@amd.com.
|
|
Image Converter Consider a simple multimedia application, for example an image converter that converts a BMP image to a YUV image format. This involves reading individual pixels from the BMP image and converting the pixels into YUV format. Instead of operating on individual pixels, if we can pack the pixels and operate on a set of pixels with a single instruction it will result in higher performance. This is an example where using SSE instructions can give a performance boost. Assume the bitmap image consists of 8 bit monochrome pixels. By packing these pixel values in a 128 bit register (8 bit * 16 pixels) we can operate on 16 values at a time. Please refer to the AMD SSE5 specification for comprehensive details on SSE instructions.
FMADDPS – Multiply and add packed single precision floating point instructionOne of the typical operations computed in transformations such as DFT of FFT is of the form  Let f(n) and x(n) be two source buffers, for example src1 and src2, and let p be the destination to accumulate the results. All the buffers in the discussion are of floating point type. The implementation in plain C for N = 4(128 bits) is as follows: for(int i =0; i< 4; i++) { p = p + src1[i] * src2[i]; } The code generated in x86 instructions per iteration is as follows: //src1 is on the top of the stack; src1 = src1 * src2 fmul DWORD PTR _src2$[esp+148] //p = ST(1), src1 = ST(0); ST(1) = ST(0)+ST(1);ST-Stack Top faddp ST(1), ST(0)
The total number of instructions generated for 4 iterations= 2 * 4 = 8. The above calculations in SSE2 instructions are as follows: //xmm0 = p, xmm1 = src1, xmm2 = src2 mulps xmm1, xmm2 addps xmm0, xmm1
However, the SSE5 instruction accomplishes the same computation in a single instruction: //xmm0 = p, xmm1 = src1, xmm2 = src2 fmaddps xmm0, xmm1, xmm2, xmm0
|
|
» AMD64 Technology 128-Bit SSE5 Instruction Set
Please send feedback to: SSE5.feedback@amd.com.
|
|
|
|
|
|