HSAIL-based GPU offload: the Quest for Java Performance Begins
GPU offload is a well-known technique for accelerating parallelizable programs, but has been slower to penetrate the Java space, due to the lack of GPU code generation support in current JVMs. Sumatra is the first open source project that aims to integrate GPU offload capabilities directly into the JVM.
The AMD Runtimes team has submitted to the OpenJDK community a patch that extends the Java Virtual Machine to generate code that can be executed on the GPU/APU. This project leverages an OpenJDK project known as Graal (http://openjdk.java.net/projects/graal/), which is a highly extensible JIT compiler for the Hotspot JVM, featuring backends for different ISAs (x86, Sparc).
AMD’s submission extends Graal with a backend for generating HSAIL code, the intermediate format defined by the HSA (Heterogenous System Architecture) Foundation (http://hsafoundation.com/standards/). This allows many Java programs to be compiled and executed on HSAIL-enabled GPU/APU devices. Although this work is a prototype, we have included several working unit test cases, including Mandelbrot and NBody.
The parameters of a compiled method can be primitives or objects and when the parameters are arrays, the array object is passed, and not just the data.
The test cases (except for BasicHSAILTest) require an HSAIL simulator or hardware to execute, but in lieu of a simulator or hardware they will output the HSAIL code generated, which is useful for debugging. An open source simulator for HSAIL will be released soon.
Moreover, BasicHSAILTest provides a template for adding Java code snippets and viewing the HSAIL generated code without executing the code.
Example: HSAIL code generated for Squares
To illustrate how HSAIL code generation works, below is a simple JUnit test case that squares the contents of two integer arrays.
The test will generate the following HSAIL code.
Pretty neat, isn’t it? Note that you can also write your test case in the following way (using the JDK8 lambda syntax) and generate the same code as above:
Stay tuned for our next blog which will show the code generated for a more complex example involving arrays of objects.
This post is the opinion of the author and may not represent AMD’s positions, strategies or opinions. Links to third party sites and references to third party trademarks are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.