Clicking on the following pictures will take you to the respective YouTube video.
OpenCL™ Optimizations on ImageMagick: Convert, Edit, and Compose Images
Lihua Zhang | MulticoreWare Inc., General Manager of China Operations | CG-4192
ImageMagick is an open source software suite to create, edit, compose, or convert bitmap images. The goal is to optimize it with OpenCL to significantly improve its image processing efficiency. thus far achieving up to 15x speedup on some image processes on Trinity platforms. We implemented GPU jpeg decode and encode so the data to move is significantly reduced. We utilized AMD APU’s zero copy capability to improve the data transfer efficiency.
Performance Evaluation of AMD-APARAPI Using Real World Applications
Prakash Raghavendra | NITK, Surathkal, Associate Professor | WT-4236
Java Aparapi allows Java developers to take advantage of the computational power of GPU and APU devices by executing Java parallel code fragments on the GPU rather than being confined to the local CPU. Performance analysis is done by running real world problems programmed in Java using Aparapi. Each program is written in multi-threaded Java for proper comparison.
Advanced OpenCL™ Debugging and Profiling using AMD tools
Avi Shapira | AMD, Director, Developer Tools | PT-4331
Developing robust parallel computing applications is difficult. In this talk we will introduce the audience to AMD’s developer tools and display advanced debugging and profiling techniques that help locate hard-to-find OpenCL related bugs and performance issues.
An Overview of HSAIL
Benedict Gaster| AMD, Programming Models Architect | Norman Rubin | AMD, Fellow | PL-4297
HSAIL is a new virtual byte code and virtual machine designed for parallel compute on heterogeneous devices. HSAIL makes it easy to compile high performance code both for current and future architectures. HSAIL programs will run unchanged on future hardware , and has been architected to support modern high level programming languages such as Java and C++. Unlike previous GPU byte codes, the HSAIL memory model uses a formal design based on acquire/release semantics.
Mike Chu | AMD, Member of Technical Staff, Heterogeneous System Architecture | Benedict Gaster | AMD, Programming Models Architect | Lee Howes | AMD, MTS, Heterogeneous System Software | PL-3660
With the success of programming models such as Khronos’ OpenCL, heterogeneous computing is going mainstream. However, these systems are low-level, even when considering them as systems programming models. OpenCL is effectively an extended subset of C99, limited to the type unsafe procedural abstraction that C has provided for more than 30 years. In this talk we introduce OpenCL C++, an object-oriented programming model (based on C++11) for heterogeneous computing and an alternative for developers targeting OpenCL enabled devices.
OpenCL Acceleration of x264
Steven Borho | MulticoreWare Inc, Solution Architect | MM-4203
x264 is the world’s most popular H.264 video encoder, and is highly optimized with hand written vectorized assembly code. This presentation will describe how we ported the lookahead (pre-encode) processing to OpenCL for improved performance and encode efficiency.
Bolt: A C++ Template Library for HSA
Benjamin Sander | AMD, HSA | PL-4245
In this talk we describe a C++ template library optimized for AMD’s Heterogeneous System Architecture. In many cases developers will be able to create a single source code base, which runs efficiently on both the CPU and the GPU. We provide examples that show a dramatic reduction in lines of code. Finally, we show how the library allows developers to easily access the unique capabilities of HSA , including shared virtual memory, tight CPU and GPU communication, and advanced queuing capabilities.
OpenCL Enabled Face Detection Plug-in for IrfanView
Yao Wang | MulticoreWare Inc., Project Manager | MM-4170
Our Face Detection plug-in allows IrfanView, a popular FREEWARE graphic viewer, to filter portraits from a large photo gallery. It employs AdaBoost classifier as the core part and a few pre-processing steps including jpeg decode, resizing and histogram. The operations construct a pipeline in which every step could be processed on CPU or GPU. The most computation intensive part, haar feature face detection, is a good problem to solve with GPU. In this session, we’ll describe our OpenCL porting of OpenCV’s object detection method for AMD’s GPU/APU, our optimizations on the face detection pipeline, and the performance speedup we have seen on APU.
Compute in the Future of Gaming
Gareth Thomas | Codemasters, Principal Programmer | CG-4292
With DX11 capable hardware now the norm for gamer spec PCs and the next generation of consoles on the horizon, AAA games development is about to take a very interesting turn towards compute-based solutions. This talk will offer an insight into some of the techniques now possible with compute and what this means for games over the next 5 years.
GPU Compute in Games, Present and Future
Dan Baker | Firaxis Games, Architect | CG-4293
Although Graphics Processing Units (GPUs) have been able to accelerate many general-purpose algorithms for years, the wide range of capabilities across different GPU models has made it difficult to deploy GPU-accelerated code across a broad consumer base. As DirectX 11-class hardware with compute shaders becomes ubiquitous, it is now feasible to use GPU-compute tasks in primary code paths instead of only peripheral ones. In this talk we will discuss how Firaxis Games used GPU compute tasks in Civilization 5, and discuss how we are moving CPU code to the GPU for future projects.
The Future of PC Gaming
Dan Baker | Firaxis Games | Mark Caldwell | Plaor | Jon Peddie | Jon Peddie Research | Rex Sikora | PopCap | Chris Taylor | Gas Powered Games | Gareth Thomas | Codemasters | CG-4306
An AMD panel of leading PC Game industry experts discussing the future of PC Gaming based on their experience in the industry and how new trends in Cloud, Social, Free-to-Play, Subscription, and Online will impact the future of PC Gaming.
HSA From A HPC Usage Perspective
Vinod Tipparaju | AMD | HPC – 4740
A explanation of several programming models that are being developed by the HPC community to extend the existing programming model to work on heterogeneous computing systems. Including how to maintain data consistency across networks, CPUs and GPUs. In this discussion we describe the details on how addressing such issues have been simplified by HSA (Heterogeneous System Architecture) architecture, HSA runtime, and OpenCL 2.0.
Can GPGPU Programming be Liberated from the Data-Parallel Bottleneck?
Benedict Gaster | AMD, Programming Models Architect | Lee Howes | AMD, MTS, Heterogeneous System Software | PL-4130
With the success of programming models such as Khronos’ OpenCL and NVIDIA’s Cuda, Heterogeneous computing is going mainstream. A limitation of the OpenCL/Cuda programming models is that to date they have really reflected the GPU programming model of the previous decade: they have focused on fine grained data-parallel workloads. We will introduce a model of braided parallelism and an object-oriented programming model for heterogeneous computing.
Accelerating OpenCV on AMD GPUs with OpenCL
Shengen Yan | Institute Of Software Chinese Academy Of Science, Student | PL-4312
OpenCV is a widely used library of programming functions for real time computer vision. Our work is to implement and maintain an OpenCL version of OpenCV to have all frequently used functions implemented and optimized with OpenCL.We have implemented and optimized over 50 core functions and an advanced application on AMD GPUs.
Harnessing GPU Compute with C++ AMP – Part 1
Daniel Moth | Microsoft, Principal Program Manager | PT-3601
C++ AMP is an open specification for taking advantage of accelerators like the GPU. In this session we will explore the C++ AMP implementation in Microsoft Visual Studio 11. After a quick overview of the technology understanding its goals and its differentiation compared with other approaches, we will dive into the programming model and its modern C++ API. This is a code heavy, interactive, two-part session, where every part of the library will be explained. Demos will include showing off the richest parallel and GPU debugging story on the market, in the upcoming Visual Studio release.
Harnessing GPU Compute with C++ AMP – Part 2
Daniel Moth | Microsoft, Principal Program Manager | PT-3602
C++ AMP is an open specification for taking advantage of accelerators like the GPU. In this session we will explore the C++ AMP implementation in Microsoft Visual Studio 11. We will dive into the programming model and its modern C++ API. This is a code heavy, interactive, two-part session, where every part of the library will be explained. Demos will include showing off the richest parallel and GPU debugging story on the market, in the upcoming Visual Studio release.
GPU Acceleration of Interactive Large Scale Data Analytics Utilizing The Aparapi Framework
Ryan LaMothe | Pacific Northwest National Laboratory, Research Scientist | CC-4257
The extreme volume of unstructured data being generated worldwide that must be analyzed, abstracted and understood has for years fueled extensive research to create intuitive, meaningful insights to the data. Capitalizing on human beings innate ability to rapidly comprehend visual imagery, PNNL’s IN-SPIRE processes this data and presents the results to users in a variety of intuitive and interactive visualizations. In this session, we will explore the use of AMD’s Aparapi to accelerate critical computational analytics and user interactions through high performance GPU computations.
IOMMUv2: The Ins and Outs of the Heterogeneous GPU Use
Paul Blinzer | AMD | Andrew Kegel | AMD, Manager, Research | CC-4334
Using the GPU in heterogeneous platform environments requires access to system memory that transcends the use scenarios of traditional IOMMU devices in system software. To that end AMD introduced the IOMMUv2 device in the platform that in addition to IOMMU functionality as used by virtualization SW provides hardware services that can be utilized as HSA MMU for more efficient but secure memory access in application software. This session provides an overview of the hardware device and its many uses in system software for virtualization and HSA.
Task Manager (TM) – A Parallel Building Library for Heterogeneous Computing
Lihua Zhang | MulticoreWare Inc., General Manager of China Operations | PT-4169
TM is open sourced and a ULL intended to assist developers in optimally programming parallel software on heterogeneous computing systems to achieve highest performance, throughput and utilization. Its primary function is to provide APIs for designing task based applications and implementing dynamic workload balancing across the entire heterogeneous system.
Quickly optimize OpenCL™ applications with SlotMaximizer
Matthieu Delahaye | MulticoreWare Inc., Senior Software Engineer | PT-4229
SlotMaximizer is a transformation tool that automatically tunes OpenCL™ kernels, helping to increase developer productivity. It aids developers to obtain increased performance, higher throughput, and better hardware utilization from their kernels with minimal effort while maintaining a small, readable and maintainable code base. SlotMaximizer enables developers to focus on their original problems and algorithm strategies and leave the details of optimizing the code to the compiler. This session will provide a user-oriented tutorial to Fusion developers.
Accelerating Sparse Linear Solvers on Heterogeneous Devices
Stefano Charissis | Victor Chang Cardiac Research Institute, Research Assistant | HC-4167
A common limiting factor in solving memory bound problems such as large sparse linear systems on compute devices is inter-device communication bandwidth. Our test case, simulation of electrical propagation in the heart using Finite Element Modelling (FEM), requires solving large systems of ODEs and PDEs over millions of nodes. Since this performance is both energy and cost efficient, we suggest that the APU architecture has potential as a key building block of HPC architectures for FEM.
clMAGMA: Heterogeneous High-Performance Linear Algebra with OpenCL
Stan Tomov | UTK, Research Director | PT-4255
The use of GPUs is becoming pervasive in high-performance scientific computing. To further accelerate and enable this transition, fundamental libraries often must be redesigned to fully exploit the power that GPUs present. We present clMAGMA – an OpenCL port of the current state-of-the-art developments on “Matrix Algebra on GPU and Multicore Architectures” (MAGMA). The new developments, combined with the use of OpenCL, will further propel clMAGMA’s portability and impact on the nation’s software cyber infrastructure.
German Andryryev | AMD | PT-4290
Fabric Engine: High-Performance Computing for Dynamic Languages
Peter Zion | Fabric Engine Inc., Chief Architect & Co-Founder | WT-4540
Fabric Engine is a high-performance processing engine that integrates with dynamic languages such as javaskript and Python and exposes an interface for defining high-performance, multi-threaded, native computation. In this talk, Peter Zion will present an overview of Fabric’s architecture and how it can be used to bring multi-threaded performance to web applications.