If you haven’t heard the news already, AMD has just released drivers supporting the latest OpenCL™ 2.0 standard from Khronos. We think this marks a huge milestone in the path toward improving heterogeneous compute acceleration. OpenCL 2.0 implements many of the advances AMD has been discussing as part of our Heterogeneous System Architecture (HSA) initiative. Notably, the notion of sharing memory with pointer-based data structures between GPU and CPU devices can greatly simplify the steps involved in enlisting the GPU for compute acceleration. Also, the ability of the GPU device to initiate compute tasks via the OpenCL 2.0 Device Enqueue feature opens up a much more powerful programming model for compute kernels. Generic address space is also a huge programmability advantage over OpenCL 1.2, simplifying the OpenCL memory model. OpenCL 2.0 also introduces a new memory object called Pipe, which helps in organizing data as a FIFO. This is useful for applications having producer-consumer design. These and other advances of OpenCL 2.0 will help you tap into the tremendous performance potential of modern heterogeneous systems.

Comprehensive OpenCL 2.0 support

In concert with the OpenCL 2.0 driver, AMD has made available a Beta version of the AMD APP SDK 3.0 with what we believe is the industry’s most comprehensive OpenCL 2.0 support. AMD APP SDK 3.0 Beta contains a complete set of sample code illustrating how to utilize each of the major new features of OpenCL 2.0. Some of these have been made available over the past weeks via a blog-series found here.

For the Linux® developers out there targeting AMD APUs, AMD APP SDK 3.0 Beta provides a glimpse into some the most interesting optional features defined with OpenCL 2.0 – fine-grained-buffer SVM with platform atomics. These features allow you to share data-objects coherently between the CPU and GPU with fine-grained synchronization. This programming construct has long been available on multi-core CPUs – and now it’s the GPU’s turn to join the party. Over time, expect to see this feature much more widely available, but you can check it out today by downloading the AMD APP SDK 3.0 Beta.

One of the cool things about this OpenCL 2.0 announcement is that this new functionality is already supported on recent APUs and GPUs from AMD. See here for the full list of supported products. If you’ve got one of these products, you can start programming and deploying software with OpenCL 2.0 today!

There’s More

AMD  OpenCL Programming GuideThere are a few other items also worth mentioning for AMD APP SDK 3.0 Beta. We’ve added support for the Bolt 1.3 library with new samples for Bolt C++ AMP library as well as a sample demonstrating SPIR 1.2 binary consumption. We’ve also improved the installation process by providing a Web-based installer that allows you to download only what you choose, but still allows the downloaded package to be distributed locally to the rest of your team. (Windows only for now – stay tuned for announcements on Linux support). We’ve also updated the OpenCL Programming Guide with many improvements – including full coverage of OpenCL 2.0 features. Check it out – we think it’s the best OpenCL programmers guide available.

Get Started

To use AMD APP SDK 3.0 Beta, download and install the latest AMD Catalyst™ Omega driver, which supports OpenCL 2.0. Then head to the blogs, or dive into the examples in the SDK, and have fun.

Being a Beta, we are eager for your feedback. Give us kudos, complaints, and suggestions at the AMD OpenCL developer forum. If possible, we will incorporate your feedback.

And if you’re curious, at the end of this post you will find a list of the new and updated samples in the SDK. A quick look at the list will give you an idea of the new power and capability available to you in the AMD APP SDK v 3.0 Beta.

We look forward to hearing from you on the forum. You can also comment by replying to this blog. We do listen.

Sample OpenCL™  2.0 Feature Description
SVMBinaryTreeSearch SVM Coarse Grain Demonstrates the coarse-grain Shared Virtual Memory (SVM) feature of OpenCL 2.0 using a Binary Tree search algorithm
SimplePipe Pipe Demonstrates the Pipe memory object and its APIs
PipeProducerConsumerKernels Pipe Demonstrates the Pipe as a data-sharing FIFO for a producer kernel and a consumer kernel
BuiltInScan New Workgroup Built-in APIs Demonstrates the work group level scan and work group level broadcast features introduced in OpenCL 2.0 using the PrefixSum algorithm
ImageBinarization Image Read and Write Demonstrates using images with read_write qualifier support, which is new in OpenCL 2.0
RecursiveGaussian_ProgramScope Program Scope Variable Demonstrates Program Scope Variables, a new feature of OpenCL 2.0, using a Recursive Gaussian filter implementation
SimpleGenericAddressSpace Generic Address Space Demonstrates the Generic Address Space feature introduced in OpenCL 2.0, which allows pointers to be declared without qualifying with a named address space
RangeMinimumQuery Shared Virtual Memory pointer with offset Demonstrates passing a pointer with offset as a kernel argument using Range Minimum Query algorithm, new in OpenCL 2.0
SVMAtomicsBinaryTreeInsert SVM Fine Grain Buffer + Platform Atomics Demonstrates the Fine Grain SVM buffer with Platform atomics using a Binary Tree node insertion algorithm
CalcPie C++ 11 Atomics Demonstrates atomics in OpenCL 2.0. It calculates the value of Pi using MonteCarlo analysis
FineGrainSVM SVM Fine Grain Buffer + C++ 11 Atomics Demonstrates the memory model of loads and stores with new C++11 standard, which is adopted by OpenCL 2.0 (Linux APU device)
FineGrainSVMCAS SVM Fine Grain Buffer + C++ 11 Atomics Demonstrates the atomic operation “CompareAndSwap” call called “atomic_compare_exchange”, introduced in OpenCL 2.0 (adopted fromC11 standards – requires Linux APU device)
RegionGrowingSegmentation Device-side Enqueue Demonstrates how to use the device-side enqueue feature of OpenCL 2.0 for a Region Growing Segmentation algorithm
DeviceEnqueueBFS Device-side Enqueue Demonstrates Breadth First Search implementation using the device-side enqueue feature of OpenCL 2.0.
ExtractPrimes Device-side Enqueue + New Workgroup Built-in APIs Demonstrate the new workgroup builtins and device-side enqueue in a finding Prime number algorithm
SimpleSPIR SPIR Consumption (Not a OpenCL 2.0 feature) Demonstrates SPIR code consumption using OpenCL APIs
SimpleDepthImage Depth Image Demonstrates the depth Image APIs
Sample OpenCL™  2.0 Feature Description
GlobalMemoryBandwidth Shared Virtual Memory Measures the peak-bandwidth of the device buffer. For devices on OpenCL version 2.0 and higher, it additionally shows peak bandwidth for the Shared Virtual Memory (SVM) buffer
BinarySearch_DeviceSideEnqueue Device-side Enqueue Enhanced to use device-side enqueue for Binary Search on an OpenCL 2.0 device. Uses iterative host side enqueue for OpenCL 1.x devices
BufferImageInterop Buffer Image Interop Updated to skip checking for the BufferImageInterop extension on an OpenCL 2.0 compliant platform as it is a core feature of OpenCL 2.0
BufferBandwidth Shared Virtual Memory Now also measures the SVM buffer bandwidth on a OpenCL 2.0 compliant device


Marty Johnson is Director of Product Engineering at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.


7 Responses

  1. Daniel Gardner

    When are you going to dump C/C++ as a development language for APP and go for C# or Delphi. I find C/C++ the worst development language out. I’s the worst thought out language out. I think it was only developed only to be in opposition to
    Pascal which was develop Europe. After programming in Delphi I find C/C++ just cryptic and sloppy. The header files are
    a total mess. designed only to get around the problems of a sloppy compiler. Open a header file and you got this file
    includes that file, which includes that file and it goes on for ever. What a mess? Try nutting out all those #ifdefine and
    #endif constructs, another big mess. Also doing a re-compile mean the compiler may have to go through all those header
    files again. Delphi has compiled units end of header file problem. Also the syntax of C/C++ is just cryptic garbage.

    As to OpenCL, I think it should be done in C# or Pascal(Delphi) and hide all the routines a library or DLL and just call the
    routines. Integrate OpenCL into Visual Studio or Delphi and program in the same syntax for both CPU or GPU devices.

    At the moment I two AMD computers and three AMD graphics cards and I’ve had many older AMD products. There is nothing
    wrong with AMD, just the dreadful C/C++ development environment for OpenCL.

    C/C++ was developed in the day of the DOS box, drop it and move on!

    • gruffi

      Dump C/C++ in favor of C# or Delphi? That was a really good one. Thanks for the laugh!

      Okay, let’s be serious. OpenCL is an open standard. That’s why it should NEVER EVER be exclusively tied to a proprietary and platform dependent language like C#. Period. Additional language support is another story.

      It’s also funny that you complain about the “cryptic C/C++” and mention Delphi in the same sentence. Delphi isn’t any better. Believe me. I started programming with Pascal and also worked on big projects written in Delphi. To date I have written most code in C/C++. I would prefer C/C++ over Pascal/Delphi any day. Probably every programming language you are not used to looks cryptic first. The syntax of C/C++ actually is okay. It’s not the best, but also not the worst. It’s acceptable. Especially unnecessary parentheses and semicolons can be annoying. But that’s a problem you have to deal with in many other languages as well.

      Yes, the header system of C/C++ can be a mess. But by far not enough to dump a language. In most situations it’s not really a problem if your code is well designed. You spend way more time on other problems than simple #ifndef/#define/#endif constructs. A good IDE further simplifies the header problem. For example, Microsoft’s Visual Studio supports precompiled headers. It can improve compilation times a lot. GCC (MinGW) also supports precompiled headers.

      In my opinion C++ (not C) is still the best programming language out there. It may not be the easiest or most elegant one. But it’s still the most powerful and most flexible programming language. It combines the low level aspects of C with the high level aspects of modern object oriented programming languages. Almost every platform supports C++ development. Furthermore C++11 and C++14 introduced many significant improvements. More will come with C++17. C++ has progressed significantly since the old C++98 and C++03 standards. So, please AMD don’t dump C++ support. But I’m also sure you won’t. It probably was just wishful thinking of Daniel.

  2. Marty

    Hi Daniel, thanks for the suggestion. Yah, you aren’t the only one with these complaints for sure – in fact I share some of your frustrations. That said, the reality remains that C/C++ is still hugely popular, so unlikely we will”dump” it anytime soon.. Would definitely be interested in reliable market demand data for OpenCL with other languages tho – we’re always interested in bringing this kind of acceleration to a receptive audience..

  3. Marco

    Cool, exiciting, whatever…I can’t download it.
    For Windows only the InstallaManager option is available. Why? I downloaded it and it does not work. I’d just like to run OpenCL 2.0 applications and have fun with the new features. Where is the some_megabytes Windows7-64bit version ready for being downloaded?

    • jtrudeau

      Marco, are we talking about the APP SDK? If you download and run the InstallerManager, it will get all the necessary files, download, and install them. The initial download is much, much smaller, and you can install everything, or pick and choose what you install before you get all the hundred megabytes of files.