At SIGGRAPH this year we were showcasing the latest generation of AMD’s OpenCL™ tools. On display we had both AMD APP Profiler, a comprehensive profiler for OpenCL based applications, and AMD gDEBugger, a sophisticated tool that enables detailed source level debugging of OpenCL kernels in addition to API level debugging for OpenCL and OpenGL applications. At the show, both of these were being demonstrated as Visual Studio plugins; both AMD APP Profiler (now), and gDEBugger (soon) have standalone versions that support Linux.
AMD gDEBugger is the first OpenCL source level debugger that can debug OpenCL kernels executing on a GPU, single step into and through an OpenCL kernel , inspect local variable values, and review the contents of the arguments passed to the kernel. More impressively, AMD gDEBugger enables a developer to inspect local variables values for any Work Item within the executed ND-range. A developer may even do this simultaneously for all values in the ND-Range; AMD gDEBugger provides array and image views such that a developer may view a selected local variable’s value across the entire ND-Range, 1-d, 2-d, or 3-d.
The image view displays the local variable’s value as an image and provides tools for inspecting the point values within the image, and for highlighting ranges of values that may be of interest to the developer. In itself this is all very impressive, but this is not what impressed the software developers at SIGGRAPH who I was talking to, the most. What really impressed them was that this debugging was done on a single GPU – the same one driving the three displays on the demonstration system. You see, Nvidia’s CUDA debugger requires two separate GPUs, one for executing the CUDA code and the other for driving the debugger’s display, and Nvidia does not have tools that debug OpenCL kernel code on the GPU.
AMD APP Profiler is a comprehensive profiling solution that enables a developer to quickly hone in on the issues impacting performance of an OpenCL solution, and hence to quickly optimize their OpenCL solution.
AMD APP Profiler is a comprehensive profiling solution that enables a developer to quickly hone in on the issues impacting performance of an OpenCL solution, and hence to quickly optimize their OpenCL solution. At the high level, AMD APP Profiler provides a time based view illustrating the relationship between OpenCL API function calls, data transfers, and kernel executions. For each of these, a developer can see the respective begin and end points, and can query performance details for the operation. From the visible gaps between kernel executions, data transfers, or API functions, the opportunities for application optimization at the API level are self evident.
Once a developer is satisfied with API level optimization, she may then move on to kernel optimization. AMD APP Profiler provides detailed access to performance counters for a kernel. These provide detailed information about the execution of the kernel allowing her to quickly identify performance issues. Perhaps the kernel is inadvertently using the complete path to memory, rather than the fast path, perhaps there are memory access conflicts. The detailed performance counter information makes such issues immediately evident.
Mark Ireton is the Product Manager for Compute Solutions at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites, and references to third party trademarks, are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.