Find an Open Source library here to help you code faster with faster code

From optimizing heterogeneous scheduling, data transfers and cluster communications to working with OpenCL™ inside the Mac, Windows or Linux OS, you’re sure to find an Open Source library here to help you code faster with faster code.

XvBA SDK and Tools

XvBA SDK and Tools - X-Video Bitstream Acceleration (XvBA) is AMD’s video acceleration API for Linux. It allows Linux applications to take advantage of the UVD engine in AMD GPUs to accelerate video decoding. The XvBA SDK contains the header file and XvBA Specification. XvBA Tools is a library and small suite of tools for demonstrating use of XvBA.


PARALUTION is a sparse linear algebra library with OpenCL™ support. It has a rich collection of iterative solvers and preconditioners. The library is well documented and provides various examples. The code can be compiled under Linux/Unix, Mac OS and Windows.

Bolt C++ Template Library

Bolt C++ provides a Standard Template Library, or STL, compatible library of high level constructs for creating accelerated data parallel applications.  Code written using STL or other STL compatible libraries can be converted to Bolt in minutes. Bolt requires significantly fewer lines-of-code and less developer effort.


Barracuda is an OpenCL library for Ruby, installable on Mac OS X, that supports signed integers and floats. Test integer-to-float computations have yielded up to 10x speed increases over non-heterogeneous computations performed on the same hardware.


CLOGS is a library for higher-level operations on top of the OpenCL™ C++ API. It is designed to integrate with other OpenCL™ code, including synchronization using OpenCL™ events. Currently radix sorting and exclusive scan are supported.


Cloo is an open source, easy to use, managed library which enables .NET/Mono applications to take full advantage of the OpenCL™ framework.


StarPU is a task programming library for CPU/GPU hybrid architectures, allowing for optimized heterogeneous scheduling, data transfers, and cluster communications.


Libclc is an open source, BSD/MIT dual licensed implementation of the library requirements of the OpenCL C programming language. libclc is designed to be portable and extensible. To this end, it provides generic implementations of most library requirements, allowing the target to override the generic implementation at the granularity of individual functions.


ocl-radix-sort is a C++ class for sorting integer lists in OpenCL without needing extra libraries or SDKs. The algorithm is the radix sort algorithm. Each integer is made of _TOTALBITS bits. The radix is made of _BITS bits. The sort is made of several passes, each consisting in sorting against a group of bits corresponding to the radix.


Cl4d is an object-oriented wrapper for the OpenCL C API written in the D programming language. The philosophy behind cl4d is to provide a thin layer on top of the C API which makes working with OpenCL less painful by harnessing D’s linguistic power.


FFMPEG is a popular and leading open source media library and framework, able to decode, encode, transcode, mux, demux, stream, filter and play, with support for a large number of formats. AMD teams have added OpenCL™ support to ffmpeg library and also accelerated deshake and unsharp video filters using OpenCL™, which can be useful in video transcode, video edit, streaming live broadcasts and much more. Acceleration of more video filters using OpenCL™ is in works.


SnuCL is an OpenCL framework that extends the original OpenCL semantics to the heterogeneous cluster environment. The target cluster consists of a single host node and multiple compute nodes. They are connected by an interconnection network, such as Gigabit and InfiniBand switches. The host node contains multiple CPU cores and each compute node consists of multiple CPU cores and multiple GPUs. For such clusters, SnuCL provides an illusion of a single heterogeneous system for the programmer.


Clpp is an OpenCL Data Parallel Primitives Library. It is a library of data-parallel algorithm primitives such as parallel-prefix-sum (“scan”), parallel sort and parallel reduction. Primitives such as these are important building blocks for a wide variety of data-parallel algorithms, including sorting, stream compaction, and building data structures such as trees and summed-area tables.


VexCL is a vector expression template library for OpenCL. It has been created for ease of OpenCL developement with C++. VexCL strives to reduce amount of boilerplate code needed to develop OpenCL applications. The library provides convenient and intuitive notation for vector arithmetic, reduction, and sparse matrix-vector multiplication. Multi-device and even multi-platform computations are supported.