OpenCV 3.0 – The Transparent API and OpenCL™ Acceleration

Hi everyone.

I am often asked how to compare performance of a software implementation that uses OpenCL™ to that of a “native CPU” implementation that uses only pure C++, possibly also using Integrated Performance Primitives (IPP). This comparison is straightforward in OpenCV 3.0, a popular computer vision library that AMD has supported since 2011, and I will explain how to do that shortly.

Transparent Acceleration via OpenCL

First, let me give you a brief introduction to OpenCV 3.0. In OpenCV 3.0, the library supports transparent acceleration via OpenCL. At runtime, if OpenCL is available and not disabled, OpenCL will be used by default (and preferentially), if the algorithm has an OpenCL implementation. Just like 2.4, there are plenty of algorithms with OpenCL implementations, especially in the imgproc module!

Enabling or disabling OpenCL is controlled globally via an environment variable, which you should set or clear before you run the performance tests, or the samples. In this blog I’m using Windows 7, 64 bit OS. Modifications for other platforms should be fairly obvious, to those skilled in the art!

To disable OpenCL (to enable pure native runs) do:

set OPENCV_OPENCL_RUNTIME=qqq

To re-enable OpenCL (remember, it was enabled by default, before you disabled it with the line above), you need to clear the environment variable, for example:

set OPENCV_OPENCL_RUNTIME=

You can also specify a particular OpenCL device for the run, for example:

set OPENCV_OPENCL_DEVICE=:GPU:0

The Transparent API

Back to the transparent API: it enables you to unify, in a single code base, native and OpenCL-accelerated programming. You write your code only once! Gone are the days (of OpenCV 2.4), where you had to use different functions to enable an OpenCL run. In fact the “ocl” namespace and folder are gone, and so is the “oclMat.” There is a new unified data structure—the UMat—that handles data transfers if needed.

All code under the transparent API (“T-API”, “T.API”) needs to be numerically equivalent. The accuracy tests enforce that. This makes perfect sense. You don’t want to have different results depending on which platform you run your code on.

Library developers who wish to enhance the library should implement new functionality in both OpenCL and C++, and should add accuracy and performance tests. Comparing the C++ and OpenCL results is a good sanity check anyway!

Library users can just declare their variables as UMat type, and reap the benefit of transparent acceleration that works on all platforms supporting OpenCL (including discrete and integrated GPUs).

With that understanding, let’s get back to performance testing.

First, use the master branch from https://github.com/itseez/opencv to get the code. Configure cmake to generate code for your platform. OpenCV now provides IPP binaries, and IPP is enabled by default. This is great, because you can compare OpenCL and IPP directly on various platforms, and draw your own conclusions! You can also configure cmake to use multi-threading of your choice. MS concurrency is enabled by default.

If you are planning to compare data from many different platforms, it pays to be systematic and organized, in terms of naming conventions. I recommend that you name the output directory that cmake uses for the targets according to the options enabled in cmake. For example, a good naming convention for your “buildDir” is:

OCV[date]_[Compiler]_[cmake options]_[arch]

Replace the names in brackets with your configuration. For example, in the above [Compiler] might be VS2013, [cmake_options] could be ocl_ipp, or ocl_noipp, and so on.  [Arch] might be “x86”, “x64” etc.

You should always leave WITH_OPENCL turned on (which is the default), otherwise you will not be able to compare the results of your tests!

After you generate the code in cmake, you need to build it (to state the obvious). Observe the binaries in [buildDir]\bin\Release.  There are performance tests per module, with naming conventions as follows:

opencv_perf_[module name].exe

We will use the image processing module, “imgproc,” as an example. The name of the performance test is, yes, you guessed it, opencv_perf_imgproc.exe!

Then, you need the test data.  Get them from https://github.com/itseez/opencv_extra.  You should use the “master” branch.

After you get the test data, you need to set an environment variable so that OpenCV performance tests will find the data directory:

set OPENCV_TEST_DATA_PATH=[path to opencv_extra-master]\testdata

Once again to state the obvious, use your actual path.

We can now run the test, and output the results to a file. Here again, for the purpose of comparison across runs, it pays to be systematic with the naming conventions. The naming convention I use is:

[ocv module]-[platform]-[cmake options]-[runtime options]-[arch]

For example, [ocv module] might be “imgproc”;  [platform] might be “KV35W” for a 35W Kaveri; [cmake options] and [arch] will be as above;  [runtime options] can be something like “oclgpu0”, “noocl”.

Finally, here is how to run the tests:

opencv_perf_[module name].exe –gtest_filter=*OCL* --gtest_output=xml:[output file as above].xml

That’s it! Simply run this on the platform of your choice. The good thing about OpenCL, and OpenCV, is that the OpenCL runtime is loaded dynamically. So you can build your code once (or at least once per cmake configuration and OS). You can also distribute the binaries to the test platforms, as long as they have the same OS. OpenCL will be loaded at runtime and your code will just work, using the OpenCL driver that is present in the system, and independent of whichever IDE you used to compile the “host” code. Of course, you can also disable OpenCL, as explained above, and then you end up with performance data on a native run, under the transparent API!

For illustration purposes in what follows, I am using OpenCV 3.0 code as of 9/16/2014, on a 35 Watt AMD Embedded R-Series APU, the RX-427BB. Some of you may know this as “Bald Eagle,” an embedded Kaveri APU with 8 OpenCL compute units. I ran the imgproc module twice on that platform: first enabling and then disabling OpenCL via the environment variable, as explained above.

Another great thing about OpenCV is that it comes with scripts you can use to compare different runs. They are located in [code_dir]\misc. Here is how to use my favorite, [code_dir]\misc\summary.py.

python.exe [code_dir]\misc\summary.py  -o htm [out1].xml [out2].xml > [comp].html

You can supply as many result files (of the same module) as you would like; summary.py will align the data and give you a nice web page with comparisons.

There’s More!

But wait, there’s more! While the web page is very useful, you can also load the data to Excel, and do your own statistics and plots. The summary.py script can output in csv format, but I have found that it is easier (i.e. less manual editing) to output in html, and then load the html into Excel. As the figures below show, in Excel you get external data from the web, and in the resulting dialog, just select the [comp].html file from above.

10_13_2014_img1

10_13_2014_img2

After importing, I usually clean up the file a bit, although you may be OK with it “as is.” Personally, I like to eliminate the comparison columns, which I can regenerate within Excel, and eliminate some top rows. I also substitute “ms” in the performance columns with an empty space. I split the “name of the test” into three columns, as follows: First, add two more columns next to “Name of test”, call them “subtest” and “config”. Go to Data->”Text to columns”->Delimited->Other (select “:” and treat consecutives as one). The figure below shows you the results.

10_13_2014_img3

To calculate the OpenCL advantage column (column F in the figure), I divide how long it takes to execute a test natively (e.g. C++ without/with IPP), by how long it takes to execute it in OpenCL. If this ratio is greater than one, then good news for OpenCL!

A Chart with the Advantage of OpenCL vs Other Native Runs

Last but not least, we can conveniently summarize the results using Excel’s PivotTable, or PivotChart. You can configure it as shown in the figure below.

10_13_2014_img4

You get a very nice chart with the advantage of OpenCL vs other native runs, per test. The “grand total” is a cumulative average, across all tests. Even if you aren’t familiar with pivot tables, I’m sure you can derive the information you need from the spreadsheet data, as you prefer.

Obviously, in OpenCV not all algorithms are implemented equally well, and there is some obvious low hanging fruit. For example, it wouldn’t violate any laws of physics if “integral” was faster in GPU/OpenCL than in CPU, (if fact it should be!) so there is some work left to be done. This is open source code, and we invite the community’s help!

Straightforward Performance Testing with OpenCV 3.0

Overall, the real purpose of this article is to show you how straightforward performance testing is with OpenCV 3.0, and to invite you to follow the steps above. Compare the performance of OpenCL runs against native runs of your choice, on platforms of various capabilities, and decide for yourself. You will likely want to compare “best runs” on comparable platforms (e.g. comparable in terms of power or price). It turns out, perhaps not surprisingly, that a “best run” may be platform (vendor) dependent. After all, if the platform does not support OpenCL, OpenCL will not win! However, under the transparent API those are details that do not really matter. It is the same code after all. You can write your code once and it will just work, both natively and OpenCL-accelerated, under the control of just an environment variable.

Thanks for your continuing support.

 

Dr. Harris Gasparakis is AMD’s OpenCV project manager, technical lead, and evangelist. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites, and references to third party trademarks, are provided for convenience and illustrative purposes only.  Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.

OpenCL and the Open CL logo are trademarks of Apple Inc. used by permission by Khronos.

 

10 Responses

  1. Brian

    Hi,
    I’m trying to follow this guide and successfully built and tested OpenCV in release mode. However, when I’m building in Debug mode, I get error “cannot open file ‘python27_d.lib'” What do I need to do to build in debug mode? I’m using VS2013 under Win8.1.

    Also, I noticed that summary.py is under [code_dir]\modules\ts\misc instead of [code_dir]\misc\summary.py

    I’m a newbie to OpenCV. Where is the right place to ask question about OpenCV, AMD BLAS and HSA?
    I’d appreciate any pointers.

    Thank you.

  2. Harris Gasparakis

    Brian, thanks for your interest!

    Search for my previous blogs for instructions on how to build opencv, or look in opencv.org.

    Regarding opencv qsts, please try answers.opencv.org/questions, to post general opencv questions there, that’s the right place. Briefly, if you need the debug version of python, you may want to build it by yourself.

    for AMD BLAS, we have open sourced it. Look for clMathLibraries at github, and it is not called “AMD” Blas anymore. OpenCV needs to be updated to use it. Please feel free to contribute this, that should be a fun and straightforward thing to do!

    for HSA, you can look at the HSA foundation website, and at various talks from our previous conference in slideshare. In the context of vision, i have some talks in there too. You should also look for OpenCL 2.0, which is one way where HSA is exposed at the programmer’s level.

  3. mans

    Hello,
    I downloaded OpenCV 3.0Beta prebuild libraries from OpenCV.org. I am wondering if it has openCL enabled and how I can test if it is enabled or disabled.

    Is there any way that I can find what options were used to build OpenCV?

    Is there any way that I can find if an application runs on GPU or not?

    Any sample code that should work on GPU that I can use to test OpenCL and OpenCV

    Thanks

    • Tejeswini Sundaram

      Hey mans,

      In OpenCV-3.0 the architecture concept has been changed to the so-called Transparent API (T-API). In the new architecture a separate OpenCL-accelerated cv::ocl::function() is removed from external API and becomes a branch in a regular cv::function(). This branch is called automatically when it’s possible and makes sense from the performance point of view. In the Transparent API, same code can be run on both the GPU and CPU. The only difference being the data structure Mat has been replaced by UMat, which is a new type of array that wraps cl_mem when needed.

      Source : https://docs.google.com/presentation/d/1qoa29N_B-s297-fp0-b3rBirvpzJQp8dCtllLQ4DVCY/present?slide=id.p14

  4. Claudio

    Only one small note, you are missing a “-” in the command line:
    opencv_perf_[module name].exe –gtest_filter=*OCL* –gtest_output=xml:[output file as above].xml
    should be
    opencv_perf_[module name].exe –gtest_filter=*OCL* –gtest_output=xml:[output file as above].xml

    Thanks for this wonderful guide!

  5. Anna

    Hi,thanks for the guide above. I’m tring to compare the performance of the facedetect with OpenCl and without. As the UMat is used, it is more slower than Mat. The results show that UMat costs 9573.5ms and Mat costs 160ms.
    The result is wird, isn’t it? The question is that how I can see which platform the code is loaded? Is the OpenCL on just when the data is UMat?

    • Steve, Seunghwa Song

      what kind of GPU are you using?

      if you didn’t set system variable OPENCV_OPENCL_DEVICE which specifies OpenCL device, system could use CPU+OpenCL instead.

      Furthermore, as far as I experienced, face detection function get less advantage of GPU computation.

      GPU performance was not that good because of memory copy between CPU and GPU.

    • Steve, Seunghwa Song

      You can refer to my code
      github.com/sshtel/opencv3_practice

      BTW, HOG(pedestrian detection) showed much better performance on GPU.
      I will push it soon.

  6. Prasanna

    Harris Gasparakis, is it possible to take advantage of transparent api from “opencv python” interface or is it restricted to c++?
    Thanks in advance.

  7. Neamah

    I’m working on Mac OS Yosemite and I’m not sure how to enable OpenCL at runtime.

    I’ve managed to turn OpenCL on using cmake -DWITH_OPENCL=ON.
    I edited my .bash_profile with : export OPENCV_OPENCL_RUNTIME=
    I even tried changing it to : set OPENCV_OPENCL_RUNTIME=

    But when I compile and run my .cpp program, I get the following message:

    OpenCL IS not available …

    How can I solve this problem?