With the preliminary architectural discussion out of the way, we can turn to Direct3D 10. The goals of Direct3D 10 are:
- Consistency: A major departure with previous versions of Direct3D is that hardware scales on performance, not capabilities.
- Performance: The idea here is simple, to render more (objects, textures, noise, shadows) with less (CPU burden, bandwidth). In particular, the problem of small-batch performance has to be faced: CPUs and GPUs are both becoming highly parallel race cars, but the communication between them is not advancing at the same rate and has to be viewed as a precious commodity.
- Generalization: Developers have been incredibly clever using textures for a wide variety of purposes (render to texture, shadow maps, cube maps, etc.). Direct3D 10 introduces a resource model whose shader resource views allow you to cast / reinterpret resources in a more direct and explicit manner.
- Visual effects: Direct3D 10 is architected to exploit the best modern GPU architectures such as the ATI Radeon™ HD2900 graphics technology and to allow future GPU architectures to shine.
These are admirable goals, and almost everyone who's spent time with D3D10 is enthusiastic about it. However, there is no question but that the changes and new capabilities have introduced a number of areas where porting "gotchas" can occur, and performance can suffer (although one of Microsoft's goals with the design was to eliminate “ performance cliffs”).
Before getting to the new and changed behaviors, let's briefly review some of what has been removed :
- The fixed function pipeline
- Alpha test
- Triangle fans
- Point sprites
- Wrap-texture modes
- TnL clip planes
If you've read anything about D3D10, you've probably noticed something missing from this list. “CAP BITS! No more CAP BITS, right?” Well, yes (“…and there was much rejoicing…”) and no. There are optional features within D3D10 related to the resource formats, including Multisample Antialiasing (MSAA) and FP32 filtering (FP32 blending support is required). So the old “don't assume, but check,” imperative remains, although the form has changed to ID3D10Device::CheckFormatSupport().
The return of ID3D10Device::CheckFormatSupport() is a D3D10_FORMAT_SUPPORT enum and MSAA capabilities are indicated by:
- D3D10_FORMAT_SUPPORT_MULTISAMPLE_RENDERTARGET = 0x200000
- D3D10_FORMAT_SUPPORT_MULTISAMPLE_RESOLVE = 0x40000
- D3D10_FORMAT_SUPPORT_MULTISAMPLE_LOAD = 0x400000
The last refers to the ability of a format to be used as a multisampled texture and loaded using the HLSL Load() function. The first one indicates the format is renderable and the second one that it is resolvable. There's not much use for MSAA format you can't render to, but the one that can be renderable and accessible as a texture is fine – it can be resolved by a custom shader in the app.
The point is that you can't assume you can do anything you want (resolve, sample) with the format just by checking the first bit. In fact, there could be (useless) formats that expose only the rendertarget bit.
From a programming standpoint, the evolution away from fixed-function to shader-based graphics is essentially complete in D3D10, and the rasterization and floating point specifications have been tightened up.
An example of the increased consistency is a small but welcome modification. Pixels and texels now use the same coordinate system; it is no longer necessary to offset positions or texture coordinates by 0.5 ("... and there was much rejoicing..."). Of course, this will affect the screen space in any existing code, and will be a porting issue for virtually all Direct3D 9 (D3D9) code. AMD recommends that if the 0.5 coordinate offset is done within a pixel shader, that code ought to be moved into the application, so that the shaders can remain as compatible as possible between D3D9 and D3D10.
The "small batch problem," the amount of overhead introduced per object rendered in the game, has been significantly helped in D3D10 by API redesign. AMD has measured up to a 2X improvement over D3D9 just from the API changes . Improved features such as instancing, and uber-shaders, and the exciting new feature of geometry shaders could provide an additional boost. This type of improvement is all predicated on the use of the best current-day GPUs and the expectation that gamers will continue to value the graphics capabilities of new generations of graphics hardware.
The idea of instancing is to draw as many objects per draw call as possible. Within the frame, object variations can be created using a variety of features such as large constant storage and techniques, such as displacement mapping, and, the modification of instance data as specified in by the input layout. So-called "Uber-shaders" combine multiple materials within a single shader, which has the advantage of improvement in locality of the on-chip shader cache, but has the downside of more complex flow control and use of more registers (higher GPR pressure).

Figure 2 The new pipeline
Source: Gamefest Unplugged (Europe) 2007: D3D10 Unleashed - New Features and Effects
The D3D10 graphics pipeline has changed significantly (Figure 2). Device calls within the pipeline had been renamed to match the specific stage with which the calls are associated. The majority of API calls that you will make will have prefixes such as:
- ID3D10Device::IA_ (Input Assembler)
- ID3D10Device::VS_ (Vertex Shader)
- ID3D10Device::GS_ (Geometry Shader)
- ID3D10Device::SO_ (Stream Out)
- ID3D10Device::RS_ (Rasterizer)
- ID3D10Device::PS_ (Pixel Shader)
- ID3D10Device::OM_ (Output Merger)
The typical drawing loop will require you to :
- Update your vertex buffers with IASetVertexBuffers()
- Set your index buffer with IASetIndexBuffer()
- Set vertex, geometry, and pixel shaders with calls to (VS|GS|PS)SetShader()
- Update constants (VS|GS|PS)SetConstantBuffers() and resources ( SetShaderResources() )
- Set your state objects
- Draw into the buffer using ID3D10Device::SetPrimitiveTopology() and ID3D10Device::Draw()
- Call IDXGISwapChain::Present()
Again, it's different than D3D9, but conceptually it's not all that distance, it's consistent and ultimately, it's cleaner. Note that in Step 6, you should keep SetPrimitiveTopology() and Draw() functions together: it's too easy to forget to set the proper topology and the resulting behavior is likely to cause a wild-goose-chase of a debugging session.
Geometry shaders are the splashiest of D3D10's features. The graphics pipeline (figure 2) now contains a separates shader stage which allows for "geometry amplification." A geometry shader can output as many as 1024 DWORDs of data (GS outputs only strips, while streamed out data is always expanded to lists of strip s ). Perhaps the most obvious geometry shader use would be for shadow volumes. Other uses would be for fur, mesh tessellation, or particle systems that run entirely on the GPU, and as a replacement for “old school” point sprites.
An interesting use of geometry shaders to combat the small-batch performance problem is using a GS to replicate triangles to different cubemap faces. This is a good example of the D3D10 tradeoff: reduced calls to Draw() on the part of the CPU, sure, but also a significant burden on the GPU. Listing 1 shows an implementation by AMD's Guennadi Riguer of this technique.
LISTING 1
[maxvertexcount(18)]
void main(triangle GsInShadow In[3], inoutTriangleStream Stream)
{
PsInShadow Out;
// Loop though all faces
[unroll]
for(int k = 0; k < 6; k++)
{
// Select face target
Out.target = k;
// Transform verts
float4 pos[3];
pos[0] = mul(mvpArray[k], In[0].pos);
pos[1] = mul(mvpArray[k], In[1].pos);
pos[2] = mul(mvpArray[k], In[2].pos);
// Frustum culling
float4 t0 = saturate(pos[0].xyxy*float4(-1,-1,1,1)-pos[0].w);
float4 t1 = saturate(pos[1].xyxy*float4(-1,-1,1,1)-pos[1].w);
float4 t2 = saturate(pos[2].xyxy*float4(-1,-1,1,1)-pos[2].w);
float4 t = t0 * t1 * t2;
[branch]
if (!any(t))
{
// Back face culling
float2 d0 = pos[1].xy/abs(pos[1].w)-pos[0].xy/abs(pos[0].w);
float2 d1 = pos[2].xy/abs(pos[2].w)-pos[0].xy/abs(pos[0].w);
[branch]
if(d1.x * d0.y > d0.x * d1.y)
{
// Triangle is visible -emit
[unroll]
for(inti = 0; i < 3; i++)
{
Out.pos = pos[i];
// Other data processed here
// . . .
Stream.Append(Out);
}
Stream.RestartStrip();
}
}
}
}
There are several “gotchas” associated with geometry shaders. First, one has to ensure that when computing per-edge data, one matches one's output with the output for adjacent triangles. For example, you don't want to create “T” intersections when you add new vertices. Second, triangle winding must be kept in mind – backface culling occurs in the rasterizer after the GS. Also, the ordering of triangles generated by the geometry shader may not agree with the correct back to front ordering needed for transparency. For example, generating fur shells from a mesh won't draw in the right order.
Another major improvement in D3D10 is far more flexibility regarding Multiple Render Targets (MRTs). D3D10 supports up to eight render targets in any of eight slots. For performance reasons, you should always use the lowest MRT slots available and you should not leave holes in MRT slot assignment. A limitation of MRTs is that you cannot mix MSAA and non-MSAA render targets. Even if MRT #0 isn't bound, target #0 alpha is used for Alpha-to-Coverage.
Alpha-to-coverage works even without multi-sample anti-aliasing, although this can produce a screen-door effect. The implementation is IHV dependent, but the specification is not as strict as might be desired. Don't make assumptions about how alpha-to-coverage is implemented
You must remember to enable render target masks, as shown in listing two.
LISTING 2
// enable MRT
D3D10_BLEND_DESC bd;
bd.RenderTargetWriteMask[0] = 0x0f;
bd.RenderTargetWriteMask[1] = 0x0f;
bd.RenderTargetWriteMask[4] = 0x0f;
There are, additionally, relatively minor changes that may change the display significantly. DX10's sRGB implementation is different than that of DX9. The gamma curve is differently specified, and blending and filtering is done in linear space. Obviously, this could result in quite different looking displays. Alpha blend is no longer separate and optional, so don't forget to set it using SrcBlendAlpha(), DestBlendAlpha(), and BlendOpAlpha().
Speaking of alpha blend, D3D10 supports dual source color blending, which uses 2 pixel shader outputs for the blending equation. However, this does not work with Multiple Render Targets!