Native Windows Code
In the fall of 2007, AMD released version 1.1 of its acclaimed performance library. This library, officially called AMD Performance Library (or APL), contains hundreds of functions that enable developers to extract maximum benefit from x86 silicon without knowing the ins and outs of each processor release. Clever design of the library enables it to detect and use specific performance features of each generation of x86 processors. So if you want to exploit SSE3, you can use the APL to do this without having to drop to the assembly-language level. If you ship your software to a customer whose platform supports only SSE2, your code will run fine. A dispatcher that is invoked when the library is first accessed determines the processor capabilities—such as which release of streaming SIMD extensions is supported—and it loads the fastest version of the code to run on that platform. As x86 generations add processor extensions, your code can make use of new features simply by having your customers update their version of the APL.
APL ships today for Windows®, Linux®, and Solaris x86 platforms. This article discusses using the library for Windows development. A complementary article discusses similar topics on Linux.
The first step, of course, is to download the APL. It is available here:/cpu/libraries/Pages/default.aspx. Note that unlike performance libraries from other processor vendors, the APL is available at no cost. The only requirement AMD imposes prior to download is that you register at the site and agree to the license. The major license restrictions are that no disassembly is allowed and AMD isn't responsible for what you do with the software. Fair enough.
After you download the library, run the .exe file. (Note: The instructions in the FAQ on the download page differ from this procedure. They apply to other files on the download page.) This executable is a decompressor for the full library. You'll then find you have a dll directory (containing DLLs and their .lib counterparts) and a lib directory (which contains libraries for static linking). The header files are stored in the main directory, along with the license terms and a readme file that gives a quick guide to which functions are associated with which header and library files.
With Microsoft® Visual Studio, using these libraries is no different than using any other libraries. DLLs are placed somewhere on the execution path; lib files are in the library search path for Visual Studio, and the headers are added to the include search path.
AMD strongly suggests that developers use the DLL versions of the libraries rather than opting for static linking. This approach enables you to update customers' libraries without disrupting their applications.
If you use a static library, you should use the /MT switch for compilation. (This switch sets Visual Studio to use the correct version of the C runtime library.) Likewise, for debugging versions of your software, the /MTd switch should be set and the multithreaded debug static libraries used. If your link step generates errors and you've checked your link path, then faulty settings for these switches are likely the cause of the problem. More on static linking for the APL on Windows can be found at http:/developer.amd.com/assets/APL_libraries_and_linker_issues.pdf
Note that the APL comes with 32- and 64-bit versions of the library, so choose accordingly for your intended destination platform. Recall that 32-bit apps can run on 32- and 64-bit platforms, as long as all parts of the app are 32-bit. (There is no support for mixing 32- and 64-bit components within the same application.) 64-bit apps, of course, can run only on 64-bit versions of Windows.
The APL is a native Win32 library. The easiest way to use it from within .NET is via the P/Invoke mechanism. The managed code should use the DLL import attribute in the function declaration to specify the DLL that contains the function implementation. The code may also specify how the function parameters and return value are to be marshaled across the managed/native call.
Consider, for example, the function apliThreshold_LTValGTVal_8u_C3R, which compares and replaces values in an image. It is in the ApiImage library of the APL. This function takes nine parameters:
- const Apl8u *pSrc - pointer to the source buffer
- int srcStep - source image step size
- Apl8u *pDst - pointer to the destination buffer
- int dstStep - destination image step size
- ApliSize roiSize - a struct describing the size (width and height) of the region of interest
- const Apl8u thresholdLT[3] - a 3 element array of treshold values
- const Apl8u valueLT[3] - a 3 element array of treshold values
- const Apl8u thresholdGT[3] - a 3 element array of treshold values
- const Apl8u valueGT[3]) - a 3 element array of treshold values
Here is how this function is accessed from the principal .NET languages:
C# example:
class Program
{
[DllImport
("aplImage.dll")
]
public static extern intapliThreshold_LTValGTVal_8u_C3R(
byte[] pSrc, // source buffer
int srcStep, // source step size (in bytes)
byte[] pDst, // destination buffer
int dstStep, // destination step size
ApliSize roiSize, // size of the region of interest
byte[] thresholdLT, // 3 element array of values
byte[] valueLT, // 3 element array of values
byte[] thresholdGT, // 3 element array of values
byte[] valueGT // 3 element array of values
);
VB.NET example:
Declare Function apliThreshold_LTValGTVal_8u_C3R Lib "aplImage.dll" ( _
ByVal pSrc() As Byte, _
ByVal srcStep As Integer, _
ByVal pDst() As Byte, _
ByVal dstStep As Integer, _
ByVal roiSize As ApliSize, _
ByVal thresholdLT() As Byte, _
ByVal valueLT() As Byte, _
ByVal thresholdGT() As Byte, _
ByVal valueGT() As Byte _
) As Integer
J# example:
public class Program
{
/** @dll.import("aplImage.dll") */
public static native int apliThreshold_LTValGTVal_8u_C3R(
ubyte[] pSrc, // source buffer
int srcStep, // source step size (in bytes)
ubyte[] pDst, // destination buffer
int dstStep, // destination step size
ApliSize roiSize, // size of the region of interest
ubyte[] thresholdLT, // 3 element array of values
ubyte[] valueLT, // 3 element array of values
ubyte[] thresholdGT, // 3 element array of values
ubyte[] valueGT // 3 element array of values
);
...
MC++ example:
[DllImport("aplImage.dll",CallingConvention=CallingConvention::StdCall)]
int apliThreshold_LTValGTVal_8u_C3R(
array<Byte> ^ pSrc, // source buffer
int srcStep, // source step size (in bytes)
array<Byte> ^ pDst, // destination buffer
int dstStep, // destination step size
ApliSize roiSize, // size of the region of interest
array<Byte> ^ thresholdLT, // 3 element array of values
array<Byte> ^ valueLT, // 3 element array of values
array<Byte> ^ thresholdGT, // 3 element array of values
array<Byte> ^ valueGT // 3 element array of values);
Note that all APL functions use the StdCall calling convention, which is the default value for the CallingConvention field of the DllImport attribute; so it does not need to be specified explicitly. It also allows the use of the Declare statement in Visual Basic code.
The preceding examples refer to roiSize, which is a structure that contains the size of the region of interest (or ROI). In APL parlance, the ROI is the area of the image upon which an operation is being performed. It is defined using a structure that in native C/C++ appears as:
typedef struct
{
int width;
int height;
}
ApliSize;
In the managed code definition of this structure, it is important to use the StructLayout attribute to ensure that the members of the structure are in the correct order, and the structure is of correct size. Examples of how to do this in the major .NET languages include:
C# example:
[StructLayout(LayoutKind.Sequential)]
public struct ApliSize
{
public int width;
public int height;
}
;
VB.NET example:
<StructLayout(LayoutKind.Sequential)> _
Public Structure ApliSize
Public width As Integer
Public height As Integer
End Structure 'ApliSize
J# example:
/** @attribute StructLayout(LayoutKind.Sequential)
*/
public final class ApliSize extends System.ValueType
{
public int width;
public int height;
}
;
MC++ example:
[StructLayout(LayoutKind::Sequential)]
value struct ApliSize
{
public:
int width;
int height;
}
;
It is very important that the code define the ApliSize as a value type and not object type. In C#, VB.NET, and MC++, this is done by defining it as a struct or structure instead of a class. In J#, the definition must use the extends System.ValueType directive. The J# code requires Visual Studio 2005 or later to compile.
If the ApliSize were defined as an object type, it would be passed to the native function as a pointer. Only by defining it as a value type can we ensure that it is passed by value (which is exactly what the native function expects in 32-bit mode). During the P/Invoke call to the native function, the .NET Framework will automatically pin in memory the managed parameters, and in most cases provide the necessary marshalling.
Calling the APL from native code is not difficult, and simply relies on configuring the Visual Studio Environment as described in the Getting Started section earlier. AMD provides two sample programs in C/C++ that exercise many of the functions of the library (see Resources). If you want a smaller version with greater detail, the source code, project files for Visual Studio, see the article, Using the New AMD Performance Library, at http://developer.amd.com/TechnicalArticles/Articles/Pages/3122007131.aspx. It shows how to use the library to brighten or darken an image (sample images are enclosed in the article downloads).
The library contains literally hundreds of functions, many of which are clustered into families whose primary internal differences are the size of the variables they operate on. It takes a little while to familiarize yourself with the operations and the corresponding nomenclature as used by the APL functions. But once you do, you'll have the key to a high-performance imaging and multimedia engine that extracts the every last bit of performance from the underlying silicon and which you can provide to customers and users at no cost.
Resources
Anderson Bailey is a developer with a longstanding interest in the techniques for using code to exploit processor features. He can be reached at chip.coder@gmail.com.
Patryk Kaminski is a member of AMD Technical Staff. He is a lead developer for APL at AMD.