At the Game Developers’ Conference® earlier this year, I made a presentation with my peers from our competitors about low overhead OpenGL. This was a collaborative effort where we explored techniques to drastically reduce the cost associated with the driver stack in OpenGL. The title of the presentation was “Approaching Zero Driver Overhead”, and the term AZDO has been used to encompass the suite of features that we talked about.
For the most part, the features of AZDO are core components of current OpenGL. This means that it’s compulsory to support them in recent versions of the API. Other features are accessible via extensions which, in practice, see widespread cross-vendor, cross-platform support. It might be reasonable to require those extensions for certain features of an application to function. All AMD GPUs based on the GCN architecture can support all of the features described here.
The first and perhaps most exciting of these is what are known as “persistently mapped buffers,” or PMBs. These were introduced with OpenGL 4.4 and are explained in some depth in the buffer storage extension that accompanies it. This is a compulsory feature of OpenGL 4.4; all OpenGL 4.4 implementations must support it.
In OpenGL and other graphics APIs, a buffer represents a piece of memory that the GPU can read from or write to. It is possible to “map” a buffer, which allows the CPU to read from and write to it as well. The act of mapping a buffer can be quite expensive. In older versions of OpenGL and in most other graphics APIs, using the buffer from the GPU and from the CPU at the same time is not allowed and will produce errors or undefined behavior. This means that applications need to continuously map and un-map buffers as they update their contents.
OpenGL 4.4 made it permissible and well defined to read and write buffers from both the CPU and GPU simultaneously, so long as you say up-front that you want to use the buffer that way. This allows applications to simply map the buffer once, leave it mapped and update it at will, completely avoiding the overhead of the mapping operations. This feature is sometimes called “zero-copy” as it avoids many of the copy operations associated with graphics, and is particularly interesting on shared memory architectures such as AMD APUs.
The second feature of AZDO is “bindless textures.” Textures represent specially formatted data – typically image data – which applications can use to apply detail to objects. In traditional OpenGL and some other graphics APIs, a limited set of textures can be made available at any one time, and an application must move textures in and out of this set as it draws the scene. This is known as “binding” textures and can also be quite expensive in terms of performance.
OpenGL is an important vehicle with which to expose new and exciting features of modern GPUs. It is the most widely-available, cross-platform, cross-vendor API. It is extensible, and through the use of its advanced techniques can deliver extremely high performance, and the features discussed in this post, and many more like them are available right now.
With bindless textures, that fixed set is gone. Instead, textures are given unique “handles”, which are made accessible to shaders just like any other variable that might represent a color, a matrix or other data. The shader can do with those variables as it chooses, and can use as many of them as necessary to achieve a desired effect. This has two very important implications for applications: the overhead of texture binding is eliminated, and the number of textures that can be used by a single shader or set of shaders is limited only by available resources, not by an API.
The third feature of AZDO is the ability to place a list of drawing commands in memory and send them to the GPU for execution in one large batch. This technique has become known as “multi-draw indirect” (MDI), named for the form of the OpenGL functions it includes. We’ve measured this feature pushing our GPUs to millions of draws each second.
To be fair, the MDI feature is not a panacea – there are significant limitations about what might be different between draws, which may make it difficult to integrate into a traditional rendering engine. However, many of the other AZDO features aim to remedy this. In particular, the index of the draw is passed to the shader core, and can be used to index into arrays stored in buffers which contain bindless texture handles, constants and other data.
Finally we come to sparse textures, which are also known as partially resident textures. Combined with bindless textures and array textures (large objects that aggregate many textures together), this feature provides applications with significant flexibility in how they manage memory. Sparse textures separate the dimensions, format and other attributes of a texture from the underlying memory that holds the texture’s data. Once the texture has been created, the application can allocate real memory to store parts of it on demand. This enables texture streaming, large atlases and arrays with elements “missing”.
To summarize, OpenGL is an important vehicle with which to expose new and exciting features of modern GPUs. It is the most widely-available, cross-platform, cross-vendor API. It is extensible, and through the use of its advanced techniques can deliver extremely high performance. The features discussed in this post, and many more like them are available right now.
Graham Sellers is the architect for AMD OpenGL drivers and the author of several books on OpenGL. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.