AMD Logo AMD Developer Central
  
AMD Developer Central
Webcast



Powered by
Quad-Core AMD Opteron™ Processors
Quad-Core AMD Opteron™ Processors





GPU PerfStudio
Skip Navigation LinksAMD Developer Central > GPU Tools > GPU PerfStudio
Overview

GPU PerfStudio

GPU PerfStudio is a real-time performance analysis tool which has been designed to help tune the graphics performance of your DirectX 9, DirectX 10, and OpenGL applications. GPU PerfStudio displays real-time API, driver and hardware data which can be visualized using extremely flexible plotting and bar chart mechanisms. The application being profiled maybe executed locally or remotely over the network. GPU PerfStudio allows the developer to override key rendering states in real-time for rapid bottleneck detection. An auto-analysis window can be used for identifying performance issues at various stages of the graphics pipeline. No special drivers or code modifications are needed to use GPU PerfStudio.

» Versions
» Counter Descriptions
» State Overrides
» Screenshots
» Downloads
» Setup/Installation

AMD GPU PerfStudio Versions
What’s New in Version 1.2?
  • Client and server support for OpenGL on XP and Vista
  • What’s New in Version 1.1?

  • Client and server support for DirectX® 10/Vista/Radeon™ HD 2000 family
  • Automatic bottleneck detection
  • Cell formatting
  • Enhanced filtering
  • Separate render target and back buffer state overrides
  • Drag and drop for plots and bar charts
  • Selectable UI antialiasing
  • Flexible bar charts

  • Counter Descriptions
    GPU PerfStudio provides statistics on every D3D call executed per frame in addition to the hardware and driver level counters described below. Real time graphs and bar charts can easily be created for all numeric data. Not all counters are available for all cards.

    Hardware Computer Description
    % Hardware Utilization Percent time GPU is busy
    % Vertex Wait for Pixel Percent time vertex processing is waiting for pixel processing to finish (can indicate slow pixel shaders)
    % Pixel Wait for Vertex Percent time pixel processing is waiting for vertex processing to finish
    Pre-clip Primitives Primitive count before clipping
    Post-clip Primitives Primitive count after clipping
    % Blended Pixels Percent of total pixels drawn with blending enabled
    ALU to Texture Instruction Ratio Ratio between pixel shader ALU and texture instructions
    % Pixels Passed Z-test Percent of pixels which passed the Z-test
    Overdraw Total number of pixels drawn divided by the Overdraw counter resolution. This counter can also be representative of the number of render targets in use.
    Texture Cache Miss Rate Texture cache miss rate in bytes per pixel
    Post HiZ Sample Count Number of samples after HyperZ
    Post TopZ Pixel Count Pixels after early Z culling has taken place
    Post Shader Pixel Count Pixels after shading and alpha test have taken place
    TopZ Reject Rate Rate of pixel rejection due to early Z test
    Back to top
     
    Driver Data Counter Description
    Framerate Frames per second
    LocalTextureMem Local Texture Memory used
    AGPTextureMem AGP texture memory used
    LocalVBIBMem Local Vertex buffers and index buffer memory used
    AGPVBIBMem AGP Vertex buffers and index buffers memory used
    TextureUpload Texture data uploaded
    VBIBUpload Vertex buffers and index buffers data uploaded
    PrimsPerRSChange Primitives rendered per render state change
    PrimsPerTSChange Primitives rendered per texture state change
    PrimsPerVSChange Primitives rendered per vertex shader change
    PrimsPerPSChange Primitives rendered per pixel shader change
    PrimsPerVSCChange Primitives rendered per vertex shade constant change
    PrimsPerPSCChange Primitives rendered per pixel shader constant change
    FlipStall Stalls on frame buffer flip
    VBStall Stalls on vertex buffer
    GeometryBufferAllocatedDefault
    GeometryBufferAllocatedImmutable
    GeometryBufferAllocatedDynamic
    GeometryBufferAllocatedStaging
    GeometryBufferAllocated
    Allocated memory for vertex and index buffers and stream output
    GeometryBufferUsedPercentage Percentage of allocated geometry buffer memory used
    ConstantBufferAllocatedDefault
    ConstantBufferAllocatedImmutable
    ConstantBufferAllocatedDynamic
    ConstantBufferAllocatedStaging
    ConstantBufferAllocated
    Allocated memory for constant buffers
    ConstantBufferUsedPercentage Percentage of allocated constant buffer memory used
    RenderTargetAllocatedDefault
    RenderTargetAllocatedImmutable
    RenderTargetAllocatedDynamic
    RenderTargetAllocatedStaging
    RenderTargetAllocated
    Allocated memory for render targets
    RenderTargetUsedPercentage Percentage of allocated render target memory used
    TextureDepthStencilShaderAllocatedDefault
    TextureDepthStencilShaderAllocatedImmutable
    TextureDepthStencilShaderAllocatedDynamic
    TextureDepthStencilShaderAllocatedStaging
    TextureDepthStencilShaderAllocated
    Allocated memory for ShaderResources, DepthStencil buffers and Textures
    TextureDepthStencilShaderUsedPercentage Percentage of allocated TextureDepthStencilShader memory used
    PrimsPerDepthStencilStateChange Primitives rendered per depth stencil state change
    PrimsPerBlendStateChange Primitives rendered per blend state change
    PrimsPerGeometryShaderChange Primitives rendered per geometry shader change
    PrimsPerPSSamplerStateChange Primitives rendered per pixel shader sampler state change
    PrimsPerVSSamplerStateChange Primitives rendered per vertex shader sampler state change
    PrimsPerGSSamplerStateChange Primitives rendered per geometry shader sampler state change
    Back to top

    State Overrides

    Override Description/Possible Bottleneck
    Force 2x2 Textures Is texture bandwidth (large textures) affecting performance?
    Force Disable Texture Filtering Are expensive texture filtering modes affecting performance?
    Force 1x1 Scissor Region Identifies vertex processing bottlenecks (by removing most pixel processing)
    Force Simple Pixel Shaders Identifies expensive pixel shaders
    Force Skip Draw*Prim Calls Identifies non-GPU bottlenecks (by removing most 3D graphics work)
    Force Z Test Enable Identifies z-order performance issues
    Force Z Write Enable Identifies z-order performance issues
    Force Alpha Blend Enable Identifies alpha-blending performance issues
    Force Alpha Test Enable Can identify problems related to early Z test
    Force Cull Mode Can show culling efficiency
    Force Fill Mode Used for debugging and identifying vertex density
    Back to top

    GPU PerfStudio Vista Screenshots


    Figure 1: Click to enlarge.

    Important notes:
  • DX9 applications on Vista are only supported in API mode; no hardware data or driver data is available
  • Applications to be profiled cannot be started with batch files or “launcher” applications (an exe which starts another exe). You can use shortcuts (links) for command line arguments.
  • Catalyst 7.10 or later is not supported for DX9 applications on systems running XP with HD 2000 and HD 3000 family graphics cards; instead use Catalyst 7.9.

  • Downloads
    » GPU PerfStudio 1.2 installer: GPUPerfStudio-v1.2.msi
    » GPU PerfStudio 1.2 release notes: GPUPerfStudio-ReleaseNotes.txt
    » GPU PerfStudio 1.2 documentation: GPUPerfStudioHelp.pdf

    Contact Us

    Please send any feedback, questions or suggestions to gputools.support@amd.com.


    Setup/Installation
    GPU PerfStudio requirements:
  • Windows XP SP2 / Vista (32 bit)
  • Radeon 9500 or better (for hardware data, server machine; CrossFire not supported)
  • CATALYST 7.12 (for hardware data, server machine)
  • Before installing GPU PerfStudio, uninstall all previous versions of GPU PerfStudio and PerfDash. GPU PerfStudio works best when run remotely. You must run the GPU PerfStudio installer on both machines – the “client” machine running the GPU PerfStudio application and the “server” (target) machine running the application to be profiled. If you select a custom install, you can install only the server or client on an individual machine.