| Webinar Topics |
Videos and Presentations |
CodeXL Overview and Demonstration
AMD CodeXL is a comprehensive tool suite that enables developers to harness the benefits of AMD CPUs, GPUs and APUs. It includes powerful GPU debugging, comprehensive GPU and CPU profiling, and static OpenCL™ kernel analysis capabilities, enhancing accessibility for software developers to enter the era of heterogeneous computing. |
|
Performance Evaluation of APARAPI Using Real World Applications
Java APARAPI (Java A PARallel API) allows Java developers to take advantage of the computational power of GPU and APU devices by executing java parallel code fragments on the GPU rather than being confined to the local CPU. This presentation aims at performance evaluation of APARAPI for execution of parallel Java code on GPU via OpenCL. Performance analysis is done by running real world problems programmed in Java using Aparapi. Each program is written in multi-threaded java to have proper comparison. There will around 15 real world programs which are commonly used and well known. This also have some tuning done in the APARAPI library. |
|
| Overview of HSAIL – the basis for implementing an HSA platform agnostic open-source OpenCL runtimeHSAIL is a new virtual byte code and virtual machine designed for parallel compute on heterogeneous devices. HSAIL makes it easy to compile high performance code both for current and future architectures. HSAIL programs will run unchanged on future hardware . Unlike AMDIL which is the graphics byte code, HSAIL has been architected to support modern high level programming languages such as Java and C++. This talk will introduce HSAIL at a high level, go over the virtual machine, Next we will talk about the compilation model, the reasons for a byte code rather than an exposed ISA and how HSAIL opens up HSA hardware to compiler and tool developers. We will review how HSAIL is different from PTX/LLVM and Java Byte code. Finally we will go over the one HSAIL important aspects– the memory model. Unlike previous GPU byte codes, the HSAIL memory model uses a formal design based on acquire/release semantics. |
|
| Graphics Core Next Architecture OverviewGCN is Designed to push not only the boundaries of DirectX® 11 gaming, the GCN Architecture is also AMD’s first design specifically engineered for general computing. Equipped with up to 32 compute units (2048 stream processors), each containing a scalar coprocessor, AMD’s 28nm GPUs are more than capable of handling workloads-and programming languages-traditionally exclusive to the processor. Coupled with the dramatic rise of GPU-aware programming languages like C++ AMP and OpenCL™, the GCN Architecture is truly the right architecture for the right time. Participate in this webinar to learn how you can take advantage of this new architecture in your GPGPU programs. |
Slide Deck |
MXPA – The Multicore Cross-Platform Architecture for Performance-Portable Computing (Guest Presenter from MulticoreWare Inc.)We believe that one OpenCL source implementation should perform well on all architectures. MXPA is an OpenCL runtime and compiler enabling the same code you optimized for your GPU to get portable performance for architectures from multicore x86 to ARM to DSPs.The key features of MXPA are:
- An installable OpenCL platform for multicore x86 with performance comparable or superior to existing implementations, due to drastic reductions barrier synchronization costs, and effective SIMD usage of any native vector width
- Standalone development libraries for statically compiling and linking the MXPA runtime with applications, enabling delivery of complete OpenCL applications or libraries without dependencies on a client-installed runtime or requirements to distribute uncompiled source code
- Integration with other MulticoreWare OpenCL tools, such as GMAC, making it even easier to write OpenCL code for peak performance
|
|
| Heterogeneous Computing Tips & Tricks with OpenCL™This webinar will focus on a comprehensive list of tips and tricks that AMD performance engineers use to help get the most out of their heterogeneous computing coding time and code performance. |
|
| Quickly Optimize OpenCL™ Applications with SlotMaximizer (Guest Presenter from MulticoreWare Inc.)SlotMaximizer is a transformation tool that automatically tunes OpenCL™ kernels, helping to increase developer productivity. It aids developers to obtain increased performance, higher throughput, and better hardware utilization from their kernels with minimal effort while maintaining a small, readable and maintainable code base.SlotMaximizer enables developers to focus on their original problems and algorithm strategies and leave the details of optimizing the code to the compiler. SlotMaximizer is already incorporated into the AMD Catalyst™ drivers as a preview and can be used by anyone developing applications using the AMD APP SDK. |
|
| Advanced OpenCL Debugging using AMD gDEBuggerThis webinar will cover profiling OpenCL with CodeAnalyst. We will start with an overview of the features, types of analysis performed and include an example. |
View Video |
| Heterogeneous Compute Features of AMD CodeAnalyst Performance AnalyzerDeveloping robust parallel computing applications is difficult. In this talk we will introduce the audience to gDEBugger, an OpenCL kernel source code debugger. We will display advanced debugging techniques that help locate hard to find OpenCL related bugs. |
View Video |
| Coordinating OpenCL Computations on one or more Heterogeneous Devices, presented by guest speaker Rob Farber (3 of 3)This webinarcontinues the discussion of the nine article OpenCL Portable Parallelism series by Rob Farber onThe Code Project. Articles4, 5, and 8 will be discussed demonstrating how to concisely utilize multiple command queues and to coordinate tasks across multiple heterogeneous devices such as the two GPU+ CPU configuration used in the articles. Complete working code samples will be discussed including a massively parallel random number test framework. In combination with a strong scaling execution model, the ability to choreograph asynchronous data movement and overlapped computations on multiple devices makes OpenCL a powerful development tool to consider to incorporate scalable portable parallelism into your applications. |
|
| Accelerate Rendering by an Order of Magnitude with OpenCL plus a View to the Multi-core and Web-enabled Future, presented by guest speaker Rob Farber. (2 of 3)This webinarconcludes the discussion of the nine article OpenCL Portable Parallelism series by Rob Farber on the Code Project. Articles 6, 7, and 9 willdemonstrate how to use OpenCL to provide high-quality, fast rendering in combination with primitive restart, a new feature added to the OpenGL 3.1 standard. As CPUs add ever more cores, device fission lets OpenCL programmerspartition the hardware capability to achieve the best resource usage. Concluding thoughts will include a discussion of webcl, which allows the use of OpenCL inside a web browser. |
|
| Introducing OpenCL Portable Parallelism presented by guest speaker, Rob Farber (1 of 3)This webinar will introduce the nine article OpenCL Portable Parallelism series by Rob Farber on The Code Project. Articles 1, 2 and 3 will be discussed including (1) C and C++ APIs for OpenCL plus building and running applications (2) OpenCL memory spaces and (3) the OpenCL execution model. Complete code examples from each article will be discussed to help get started with OpenCL. In particular, the importance of the OpenCL strong scaling execution model will be highlighted along with other reasons to consider OpenCL for your application development. |
|
| Write Once Run Anywhere This presentation shows how Aparapi, an API for expressing data parallel workloads in Java, can extend Java’s promise of ‘Write Once, Run Anywhere’ to include GPU devices. Existing Java OpenCL bindings require developers to code data parallel algorithms in OpenCL, provide explicit buffer transfers and execution requests at runtime, and if OpenCL were unavailable, code a separate Java implementation and have an appropriate fallback strategy. Aparapi allows developers to code against a simple Java data parallel API. At runtime Aparapi attempts to execute on the GPU by converting bytecode to OpenCL; if OpenCL is unavailable Aparapi will fall back to executing using a Java thread pool. We contrast Aparapi with other OpenCL Java bindings, describe how it works, and walk through some real world examples. We also discuss how to determine whether Aparapi is a viable option for your application. |
|
| Taming GPU Compute with C++ AMP Developers today inject parallelism into their compute-intensive applications in order to take advantage of multi-core CPU hardware. Beyond CPUs, however, compute accelerators such as general-purpose GPUs can provide orders of magnitude speed-ups for data parallel algorithms. How can you as a C++ developer fully utilize this heterogeneous hardware from your Visual Studio environment? How can you benefit from this tremendous performance boost in your Visual C++ solutions without sacrificing developer productivity? The answers will be presented in this session about C++ Accelerated Massive Parallelism. |
Slide Deck |
| Advanced OpenCL™ Debugging using gDEBugger by Yaki Tebeka, AMD Fellow Developing robust parallel computing applications is difficult. In this talk we will introduce the audience to gDEBugger, an OpenCL kernel source code debugger, integrated into Visual Studio™. We will display advanced debugging techniques that help locate hard-to-find OpenCL related bugs. Join Yaki Tebeka, an AMD Fellow, responsible for AMD’s developer tools, for this live webinar. Yaki brings over 13 years of experience in software, focusing on 3D graphics, heterogeneous computing and developer tools. |
View Video |
Introduction to Parallel and Heterogeneous Computing (1 hour)
Learn how heterogeneous computing fits into the parallel computing paradigm, what problems it solves and what opportunities it presents. |
Video
Slide Deck |
Introduction to OpenCL (1 hour)
Learn about the benefits of OpenCL, the anatomy and architecture of OpenCL and the tools and drivers available. |
Video
Slide Deck |
GPU Architecture Overview (1 hour)
Learn about the modern GPU architectures and place the devices in context of the CPU technologies available today. Get specific insight into the latest AMD hardware including the 5000 and 6000 series GPUs and how this design affects software implementation. |
Video
Slide Deck |
OpenCL Programming in Detail (1.5 hours)
Learn about OpenCL application execution, resource setup, kernel programming and compiling, program execution, memory objects and synchronization. This webinar will also get into OpenCL C Language including restrictions, data types, type casting and conversions, qualifiers, and built-in functions in the context of an N-Body example. |
Video
Slide Deck |
Real World Application in OpenCL (1 hour)
Walk through the creation of a video processing application developed by one of our engineers and get a sense of what you might be able to do with OpenCL in your own applications. |
Video
Slide Deck |
Device Fission Extensions for OpenCL (1 hour)
Learn about the unique advantage that OpenCL has when it comes to Fission extensions. |
Video
Slide Deck |
Smoothed Particle Hydrodynamics (1 hour)
This webinar describes a project in computational fluid dynamics targeted for videogame applications. The Smoothed Particle Hydrodynamics (SPH) algorithm is a particle method for simulating viscous fluids like water, syrup and air. It is based on solving the incompressible Navier-Stokes equations of fluid mechanics using a particle formulation. This webinar shows you how to build an SPH simulation in OpenCL and discusses design tradeoffs. Source code for the simulation is available. |
Video
Slide Deck
Download the source code (.rar) |
Optimizing a Convolution Algorithm (1 hour)
Learn about Debugging OpenCL, performance measurements, general optimization tips and walk through optimizing a convolution algorithm. |
Video
Slide Deck |