AMD has a long-standing track record of collaboration with open software development communities and continues to support the open source ecosystem. In addition to providing innovative advances in computing platforms, AMD has teams dedicated to building or enhancing open source tools and technologies to help software developers code faster for faster code.
|AMD CodeAnalyst Performance Analyzer for Linux is an open source, front-end graphical user interface to Oprofile. The graphical user interface simplifies the process of collecting performance data and enhances data analysis, and shows profiles in both tabular and chart form. Users can drill down from system-level processes and modules to source code or instructions. AMD CodeAnalyst also supports Instruction-Based Sampling (IBS), a new performance measurement technique available on AMD Family 10h processors.|
|Aparapi is an API for expressing data parallel workloads in Java and a runtime component capable of converting the Java bytecode of compatible workloads into OpenCL™ so that it can be executed on a variety of GPU devices.|
|The open source clMath library contains OpenCL implementations of common BLAS and FFT routines. A joint project with AMD and AccelerEyes, the library enables developers to accelerate scientific and engineering computations on APUs and discrete graphics accelerators. The project is licensed under the Apache License, Version 2.0 and is available at https://github.com/clMathLibraries.|
CodeSleuth is an Eclipse plug-in that allows developers to access these performance counters and relate them back to Java source. This tool enables you to track counter information from raw address locations, through the machine code emitted by Java Virtual Machines (JVMs), and back to the Java source. Once you identify the location of performance issues in the Java source, you can modify the code to improve application performance.Best of all, CodeSleuth works on all x86 platforms, not just AMD systems.
|AMD Embedded Solutions give designers ample flexibility to design scalable, x86- based, low-cost and feature-rich products, and drive energy conservation into their systems without compromising application performance or compatibility, graphics performance or features. AMD supports a wide range of products and technologies for Embedded solutions including AMD64 Embedded Processors, AMD M690T/E + SB600, RS780E + SB710, AMD 785E + SB8xx, Discrete GPUs and Geode LX processing technologies. AMD Embedded also supports the Linux Open Source and coreboot communities with public documentation, contributions to the community, publically available reference boards and example code.|
FlickrNet is an open source routine that implements the flickr API. AMD submitted modificationsto Flickr.cs, UploadProgressEvent.cs, and Lockfile.cs for use in the AMD Fusion Media Explorer for a better user experience.
The Unladen Swallow project is an optimization branch of CPython, which focuses on speeding the execution of Python code, via a custom virtual machine with a JIT build on top of LLVM.AMD contributes performance optimizations for this important runtime.
AMD’s Operating System Research Center (OSRC) contributes to the core Linux kernel by providing enablement for AMD processors.
AMD has made several contributions to the OpenJDK project focusing on performance improvements and performance analysis.By improving JVM performance, finding ways to make Java work better across multi-core environments, dealing with data concurrency more efficiently, and handling garbage collection more effectively, we are building performance advantages directly into Java tools and environments, making developers’ jobs easier.
The OpenCL Emulator-Debugger is an open source project created by AMD that allows developers to compile and debug OpenCL kernels as C++ procedures with the full support of Microsoft® Visual Studio® C++ development and debugging environtments.
OProfile is a system-wide profiler for Linux systems. The OSRC maintains the kernel, and at times submits patches to fix issues. AMD CodeAnalyst for Linux is based on OProfile.
Perfmon2 is a hardware-based performance monitoring interface for Linux. The OSRC submits patches to add support for new CPU features.
PTLsim is an open-source full-system simulator that provides a detailed out-of-order core. The OSRC currently maintains the stable upstream version. We also use PTLsim for implementing AMD’s Advanced Synchronization Facility (ASF) and provide a released version containing ASF support: PTLsim-ASF.
QEMU is the user-space component for Linux’s KVM hypervisor. All virtualization code that doesn’t need to be in the kernel has to be submitted to upstream QEMU. QEMU is also a crucial component of the Xen hypervisor. AMD has contributed multi-core guest support, -cpu host support to propagate native CPUID bits to the guest, NUMA support for guests, support for new AMD instructions (SSE4a, popcnt) in the emulator, reworking of CPUID code for more accurate emulation, and the userspace part for cross-vendor migration.
|Tapper is a new open source tool to help QA departments maintain a complete test life cycle from planning to execution and reporting.|
x86info is a CPU identification utility for Linux. The OSRC adds patches to support new AMD CPUs.
The OSRC supports and maintains both the base hypervisor (running on bare metal AMD hardware) and its SVM-using component.
XML-RPC is a Remote Procedure Calling protocol that works over the Internet. An XML-RPC message is an HTTP-POST request. The body of the request is in XML. A procedure executes on the server and the value it returns is also formatted in XML. AMD contributed code that disabled remote RPC calls.
X-Video Bitstream Acceleration (XvBA) is AMD’s video acceleration API for Linux. It allows Linux applications to take advantage of the UVD engine in AMD GPUs to accelerate video decoding.
The XvBA SDK contains the header file and XvBA Specification.
XvBA Tools is a library and small suite of tools for demonstrating use of XvBA.
Open Source Projects of Note from Outside of AMD
Aparapi allows Java developers to take advantage of the compute power of GPU and APU devices by executing data parallel code fragments on the GPU rather than being confined to the local CPU. It does this by converting Java bytecode to OpenCL at runtime and executing on the GPU.
Barracuda is an OpenCL library for Ruby, installable on Mac OS X, that supports signed integers and floats. Test integer-to-float computations have yielded up to 10x speed increases over non-heterogeneous computations performed on the same hardware.
cl4d is an object-oriented wrapper for the OpenCL C API written in the D programming language. The philosophy behind cl4d is to provide a thin layer on top of the C API which makes working with OpenCL less painful by harnessing D’s linguistic power.
CLOGS is a library for higher-level operations on top of the OpenCL C++ API. It is designed to integrate with other OpenCL code, including synchronization using OpenCL events. Currently radix sorting and exclusive scan are supported.
Cloo is an open source, easy to use, managed library which enables .NET/Mono applications to take full advantage of the OpenCL framework.
clpp is an OpenCL Data Parallel Primitives Library. It is a library of data-parallel algorithm primitives such as parallel-prefix-sum (“scan”), parallel sort and parallel reduction. Primitives such as these are important building blocks for a wide variety of data-parallel algorithms, including sorting, stream compaction, and building data structures such as trees and summed-area tables.
The Computing Language Utility (CLU) is a lightweight API designed to help programmers explore, learn, and rapidly prototype programs with OpenCL. This API reduces the complexity associated with initializing OpenCL devices, contexts, kernels and parameters, etc. while preserving the ability to drop down to the lower level OpenCL API at will when programmers wants to get their hands dirty.
CLyther is a just-in-time specialization engine that makes it easy for Python developers to take advantage of OpenCL. CLyther is similar to Cython and PyPy. CLyther is a Python language extension that makes writing OpenCL code as easy as Python itself. CLyther currently only supports a subset of the Python language definition but adds many new features to OpenCL. CLyther exposes both the OpenCL C library as well as the OpenCL language to Python.
FortranCL is an OpenCL interface for Fortran 90. It allows programmers to call the OpenCL parallel programming framework directly from Fortran, so developers can accelerate their Fortran code using graphical processing units (GPU) and other accelerators.
JavaCL is an API that wraps the OpenCL library to make it available to the Java platform. With JavaCL, Java programs can execute tasks directly on graphic cards and benefit from their massive parallel horsepower. JavaCL comprises the following parts: a nice Object-Oriented API that retains all the power of the OpenCL API without most of the C head-scratching JavaCL Core; demos; basic utilities (parallel reduction, experimental matrix implementation for UJMP) and an experimental Scala DSL (Domain-Specific Language). See who’s using JavaCL here.
libclc is an open source, BSD/MIT dual licensed implementation of the library requirements of the OpenCL C programming language. libclc is designed to be portable and extensible. To this end, it provides generic implementations of most library requirements, allowing the target to override the generic implementation at the granularity of individual functions.
ocl-radix-sort is a C++ class for sorting integer lists in OpenCL without needing extra libraries or SDKs. The algorithm is the radix sort algorithm. Each integer is made of _TOTALBITS bits. The radix is made of _BITS bits. The sort is made of several passes, each consisting in sorting against a group of bits corresponding to the radix.
OpenCL .Net provides bindings to the OpenCL API that mirror the OpenCL 1.1 spec as closely as possible, ands a higher level abstraction of the API that’s more .Net-like.
OCLTools is a powerful, yet compact, suite of tools that provides developers with more alternatives to kernel compilation. OCLTools enables you to eliminate costly kernel compilation time from the runtime of your application. With OCLTools developers can embed the source code of their kernels (clear text or encrypted) directly into their program binaries eliminating the need to distribute kernel source code in the open while still maintaining the flexibility of runtime compilation. Not only can you embed source code into your OpenCL binaries but you can embed precompiled kernels as well effectively eliminating the additional kernel compilation overhead from the run time of your application. OCLTools comes with an offline OpenCL compiler (oclcc), ELF file generator (oclelf), encryption tool (oclcrypt), and utility library to help streamline the OpenCL kernel compilation process.
opencl-toolbox is an open source toolkit that provides seamless integration of Matlab with OpenCL. This toolkit will consist of three parts: 1. amex backend written in C to allow calls from MATLAB to OpenCL commands; 2. a set of matlab class files that represent OpenCL data objects that can be manipulated using standard matlab operators (e.g. overriding minus, plus, rdivide, ldivide, inv, etc) 3. a set of opencl kernel files to implement more complicated algorithms (such as inverse). Current functionality includes mex backend. supports GPU & CPU devices, matlab classes for buffers and kernels and a generic clobject to convert matlab vectors/matrices to buffers.
OTOO is a performance optimized particle simulation code that is based on the octree method for heterogeneous systems. Main applications of OTOO are astrophysical simulations such as N-body models and the evolution of a violent merger of stars. OTOO was used for modeling a merger of two white dwarf stars. It was found that OTOO is powerful and practical to simulate the fate of the process.
Par4All is an automatic parallelizing and optimizing compiler (workbench) for C and Fortran sequential programs. The purpose of this source-to-source compiler is to adapt existing applications to various hardware targets such as multicore systems, high performance computers and GPUs. It creates a new source code and thus allows the original source code of the application to remain unchanged.
Portable Computing Language (pocl) aims to become an open source implementation of the OpenCL standard which can be easily adapted for new targets. One of the goals of the project is improving performance portability of OpenCL programs, avoiding the need for target-dependent manual optimizations. A “native” target is included, which allows running OpenCL kernels on the host (CPU).
PyOpenCL lets you access GPUs and other massively parallel compute devices from Python.
RLIPS (R Linear Inverse Problem Solver) is an R package for solving large overdetermined (stochastic) linear inverse problems. RLIPS transforms the original linear system into a simple upper triangular one by using Givens rotations. These rotations are made in parallel using OpenCL and GPUs.
Ruby-OpenCL is a Ruby binding of OpenCL.
ScalaCL lets programmers run Scala code on GPUs in a very natural way (using JavaCL bindings to the OpenCL API).It also optimizes general Scala loops (on arrays, lists and inline ranges) often by a big margin so you’ll want to use it even if you don’t care about OpenCL.
StarPU is a task programming library for CPU/GPU hybrid architectures, allowing for optimized heterogeneous scheduling, data transfers, and cluster communications.
VexCL is a vector expression template library for OpenCL. It has been created for ease of OpenCL developement with C++. VexCL strives to reduce amount of boilerplate code needed to develop OpenCL applications. The library provides convenient and intuitive notation for vector arithmetic, reduction, and sparse matrix-vector multiplication. Multi-device and even multi-platform computations are supported.