devsummitvideos_1140x200

Clicking on the following pictures will take you to the respective YouTube video.

OpenCL Optimizations on ImageMagick: Convert, Edit, and Compose Images

Lihua Zhang | MulticoreWare Inc., General Manager of China Operations | CG-4192

OpenCLImageMagick ImageMagick is an open source software suite to create, edit, compose, or convert bitmap images. The goal is to optimize it with OpenCL to significantly improve its image processing efficiency. thus far achieving up to 15x speedup on some image processes on Trinity platforms. We implemented GPU jpeg decode and encode so the data to move is significantly reduced. We utilized AMD APU’s zero copy capability to improve the data transfer efficiency.

Performance Evaluation of AMD-APARAPI Using Real World Applications

Prakash Raghavendra | NITK, Surathkal, Associate Professor | WT-4236

AMD-APARAPI Java Aparapi allows Java developers to take advantage of the computational power of GPU and APU devices by executing Java parallel code fragments on the GPU rather than being confined to the local CPU. Performance analysis is done by running real world problems programmed in Java using Aparapi. Each program is written in multi-threaded Java for proper comparison.

Advanced OpenCL™ Debugging and Profiling using AMD tools

Avi Shapira | AMD, Director, Developer Tools | PT-4331

DebuggingandProfiling Developing robust parallel computing applications is difficult. In this talk we will introduce the audience to AMD’s developer tools and display advanced debugging and profiling techniques that help locate hard-to-find OpenCL related bugs and performance issues.

An Overview of HSAIL

Benedict Gaster| AMD, Programming Models Architect | Norman Rubin | AMD, Fellow | PL-4297

OpenCLImageMagick HSAIL is a new virtual byte code and virtual machine designed for parallel compute on heterogeneous devices. HSAIL makes it easy to compile high performance code both for current and future architectures. HSAIL programs will run unchanged on future hardware ,  and has been architected to support modern high level programming languages such as Java and C++. Unlike previous GPU byte codes, the HSAIL memory model uses a formal design based on acquire/release semantics.

OpenCL C++

Mike Chu | AMD, Member of Technical Staff, Heterogeneous System Architecture | Benedict Gaster | AMD, Programming Models Architect | Lee Howes | AMD, MTS, Heterogeneous System Software | PL-3660

penCLC++atAFDS2012 With the success of programming models such as Khronos’ OpenCL, heterogeneous computing is going mainstream. However, these systems are low-level, even when considering them as systems programming models. OpenCL is effectively an extended subset of C99, limited to the type unsafe procedural abstraction that C has provided for more than 30 years.  In this talk we introduce OpenCL C++, an object-oriented programming model (based on C++11) for heterogeneous computing and an alternative for developers targeting OpenCL enabled devices.

OpenCL Acceleration of x264

Steven Borho | MulticoreWare Inc, Solution Architect | MM-4203

ccelerationofx264 x264 is the world’s most popular H.264 video encoder, and is highly optimized with hand written vectorized assembly code. This presentation will describe how we ported the lookahead (pre-encode) processing to OpenCL for improved performance and encode efficiency.

Bolt: A C++ Template Library for HSA

Benjamin Sander | AMD, HSA | PL-4245

TemplateLibraryforHSA In this talk we describe a C++ template library optimized for AMD’s Heterogeneous System Architecture. In many cases developers will be able to create a single source code base, which runs efficiently on both the CPU and the GPU. We provide examples that show a dramatic reduction in lines of code. Finally, we show how the library allows developers to easily access the unique capabilities of HSA , including shared virtual memory, tight CPU and GPU communication, and advanced queuing capabilities.

OpenCL Enabled Face Detection Plug-in for IrfanView

Yao Wang | MulticoreWare Inc., Project Manager | MM-4170

OpenCLImageMagick Our Face Detection plug-in allows IrfanView, a popular FREEWARE graphic viewer, to filter portraits from a large photo gallery. It employs AdaBoost classifier as the core part and a few pre-processing steps including jpeg decode, resizing and histogram. The operations construct a pipeline in which every step could be processed on CPU or GPU. The most computation intensive part, haar feature face detection, is a good problem to solve with GPU. In this session, we’ll describe our OpenCL porting of OpenCV’s object detection method for AMD’s GPU/APU, our optimizations on the face detection pipeline, and the performance speedup we have seen on APU.

Compute in the Future of Gaming

Gareth Thomas | Codemasters, Principal Programmer | CG-4292

FutureofGaming With DX11 capable hardware now the norm for gamer spec PCs and the next generation of consoles on the horizon, AAA games development is about to take a very interesting turn towards compute-based solutions. This talk will offer an insight into some of the techniques now possible with compute and what this means for games over the next 5 years.

GPU Compute in Games, Present and Future

Dan Baker | Firaxis Games, Architect | CG-4293

GPUComputeinGamesPresentandFuture Although Graphics Processing Units (GPUs) have been able to accelerate many general-purpose algorithms for years, the wide range of capabilities across different GPU models has made it difficult to deploy GPU-accelerated code across a broad consumer base. As DirectX 11-class hardware with compute shaders becomes ubiquitous, it is now feasible to use GPU-compute tasks in primary code paths instead of only peripheral ones. In this talk we will discuss how Firaxis Games used GPU compute tasks in Civilization 5, and discuss how we are moving CPU code to the GPU for future projects.

The Future of PC Gaming

Dan Baker | Firaxis Games | Mark Caldwell | Plaor | Jon Peddie | Jon Peddie Research | Rex Sikora | PopCap | Chris Taylor | Gas Powered Games | Gareth Thomas | Codemasters | CG-4306

PCGamingatAFDS2012 An AMD panel of leading PC Game industry experts discussing the future of PC Gaming based on their experience in the industry and how new trends in Cloud, Social, Free-to-Play, Subscription, and Online will impact the future of PC Gaming.

HSA From A HPC Usage Perspective

Vinod Tipparaju | AMD | HPC – 4740

HPCUsagePerspective A explanation of several programming models that are being developed by the HPC community to extend the existing programming model to work on heterogeneous computing systems. Including how to maintain data consistency across networks, CPUs and GPUs. In this discussion we describe the details on how addressing such issues have been simplified by HSA (Heterogeneous System Architecture) architecture, HSA runtime, and OpenCL 2.0.

Can GPGPU Programming be Liberated from the Data-Parallel Bottleneck?

Benedict Gaster | AMD, Programming Models Architect | Lee Howes | AMD, MTS, Heterogeneous System Software | PL-4130

Data-ParallelBottleneck With the success of programming models such as Khronos’ OpenCL and NVIDIA’s Cuda, Heterogeneous computing is going mainstream.  A limitation of the OpenCL/Cuda programming models is that to date they have really reflected the GPU programming model of the previous decade: they have focused on fine grained data-parallel workloads. We will introduce a model of braided parallelism and an object-oriented programming model for heterogeneous computing.

Accelerating OpenCV on AMD GPUs with OpenCL

Shengen Yan | Institute Of Software Chinese Academy Of Science, Student | PL-4312

OpenCVonAMDGPUs OpenCV is a widely used library of programming functions for real time computer vision. Our work is to implement and maintain an OpenCL version of OpenCV to have all frequently used functions implemented and optimized with OpenCL.We have implemented and optimized over 50 core functions and an advanced application on AMD GPUs.

Harnessing GPU Compute with C++ AMP – Part 1

Daniel Moth | Microsoft, Principal Program Manager | PT-3601

HarnessingGPUCompute C++ AMP is an open specification for taking advantage of accelerators like the GPU. In this session we will explore the C++ AMP implementation in Microsoft Visual Studio 11. After a quick overview of the technology understanding its goals and its differentiation compared with other approaches, we will dive into the programming model and its modern C++ API. This is a code heavy, interactive, two-part session, where every part of the library will be explained. Demos will include showing off the richest parallel and GPU debugging story on the market, in the upcoming Visual Studio release.

Harnessing GPU Compute with C++ AMP – Part 2

Daniel Moth | Microsoft, Principal Program Manager | PT-3602

HarnessingGPUCompute_Part2 C++ AMP is an open specification for taking advantage of accelerators like the GPU. In this session we will explore the C++ AMP implementation in Microsoft Visual Studio 11. We will dive into the programming model and its modern C++ API. This is a code heavy, interactive, two-part session, where every part of the library will be explained. Demos will include showing off the richest parallel and GPU debugging story on the market, in the upcoming Visual Studio release.

GPU Acceleration of Interactive Large Scale Data Analytics Utilizing The Aparapi Framework

Ryan LaMothe | Pacific Northwest National Laboratory, Research Scientist | CC-4257

InteractiveLargeScaleDataAnalytics The extreme volume of unstructured data being generated worldwide that must be analyzed, abstracted and understood has for years fueled extensive research to create intuitive, meaningful insights to the data. Capitalizing on human beings innate ability to rapidly comprehend visual imagery, PNNL’s IN-SPIRE processes this data and presents the results to users in a variety of intuitive and interactive visualizations. In this session, we will explore the use of AMD’s Aparapi to accelerate critical computational analytics and user interactions through high performance GPU computations.

IOMMUv2: The Ins and Outs of the Heterogeneous GPU Use

Paul Blinzer | AMD | Andrew Kegel | AMD, Manager, Research | CC-4334

IOMMUv2 Using the GPU in heterogeneous platform environments requires access to system memory that transcends the use scenarios of traditional IOMMU devices in system software. To that end AMD introduced the IOMMUv2 device in the platform that in addition to IOMMU functionality as used by virtualization SW provides hardware services that can be utilized as HSA MMU for more efficient but secure memory access in application software. This session provides an overview of the hardware device and its many uses in system software for virtualization and HSA.

Task Manager (TM) – A Parallel Building Library for Heterogeneous Computing

Lihua Zhang | MulticoreWare Inc., General Manager of China Operations | PT-4169

TaskManage TM is open sourced and a ULL intended to assist developers in optimally programming parallel software on heterogeneous computing systems to achieve highest performance, throughput and utilization. Its primary function is to provide APIs for designing task based applications and implementing dynamic workload balancing across the entire heterogeneous system.

Quickly optimize OpenCL™ applications with SlotMaximizer

Matthieu Delahaye | MulticoreWare Inc., Senior Software Engineer | PT-4229

SlotMaximizer SlotMaximizer is a transformation tool that automatically tunes OpenCL™ kernels, helping to increase developer productivity. It aids developers to obtain increased performance, higher throughput, and better hardware utilization from their kernels with minimal effort while maintaining a small, readable and maintainable code base. SlotMaximizer enables developers to focus on their original problems and algorithm strategies and leave the details of optimizing the code to the compiler. This session will provide a user-oriented tutorial to Fusion developers.

Accelerating Sparse Linear Solvers on Heterogeneous Devices

Stefano Charissis | Victor Chang Cardiac Research Institute, Research Assistant | HC-4167

AcceleratingSpareLinearSolvers A common limiting factor in solving memory bound problems such as large sparse linear systems on compute devices is inter-device communication bandwidth. Our test case, simulation of electrical propagation in the heart using Finite Element Modelling (FEM), requires solving large systems of ODEs and PDEs over millions of nodes. Since this performance is both energy and cost efficient, we suggest that the APU architecture has potential as a key building block of HPC architectures for FEM.

clMAGMA: Heterogeneous High-Performance Linear Algebra with OpenCL

Stan Tomov | UTK, Research Director | PT-4255

HighPerformanceLinearAlgebra The use of GPUs is becoming pervasive in high-performance scientific computing. To further accelerate and enable this transition, fundamental libraries often must be redesigned to fully exploit the power that GPUs present. We present clMAGMA – an OpenCL port of the current state-of-the-art developments on “Matrix Algebra on GPU and Multicore Architectures” (MAGMA). The new developments, combined with the use of OpenCL, will further propel clMAGMA’s portability and impact on the nation’s software cyber infrastructure.

OpenCL 1.2

German Andryryev | AMD | PT-4290

OpenCL1_2 The topics discussed in this session include a wide variety of extensions available in OpenCL 1.1 that have been integrated into the core and runtime for OpenCL 1.2.

Fabric Engine: High-Performance Computing for Dynamic Languages

Peter Zion | Fabric Engine Inc., Chief Architect & Co-Founder | WT-4540

FabricEngine Fabric Engine is a high-performance processing engine that integrates with dynamic languages such as javaskript and Python and exposes an interface for defining high-performance, multi-threaded, native computation. In this talk, Peter Zion will present an overview of Fabric’s architecture and how it can be used to bring multi-threaded performance to web applications.