clAmdBlas Readme Version: 1.8 Release Date: September 2012 ChangeLog: ____________ Current Version: Fixed: * Failures in the following functions: ssyr2, ssyr2k, strsm, strsv, ssyrk, cher, ctrsv, csymm, cher2, ztrmm on Southern Island GPU devices. * Failures in the following functions: dsyr, dsyr2, dgemv, dsyrk, dsyr2k, zsyr2k on Trinity platforms. Known Issues: * clAmdBlas can return invalid results on CPU devices. * clAmdBlas 32-bit Windows libraries can return invalid results on GPU devices. ____________ Version 1.8.269 (Beta, clMAGMA support): New: * No new routines * This release tested using the 8.961 runtime driver and the 2.6 APPSDK Known Issues: * The clAmdBlasTune executable has been observed to hang on Windows. If this happens, abort execution of the tune program; it is not required for correct operation of the BLAS routines (as of 8.872). * clAmdBlas can return invalid results on CPU devices (as of 8.961). The CPU device is primarily a test/debug device, and GPU devices are unaffected. * clAmdBlas can return invalid results for double precision functions (dsyr, dsyr2, dgemv, dsyrk, dsyr2k, zsyr2k) on Trinity platforms (as of 8.961). * clAmdBlas can return invalid results (ssyr2, ssyr2k, strsm, strsv, ssyrk, cher, ctrsv, csymm, cher2, ztrmm) on Southern Island GPU devices (as of 8.961). ____________ Version 1.7 (Beta, clMAGMA support): New: * New Level 3 routines added (an 'x' implies all 4 precisions) CHER2K, ZHER2K * New Level 2 routines added (an 'x' implies all 4 precisions) xTPMV, xTPSV, SSPVM, DSPMV, CHPMV, ZHPMV, SSPR, DSPR, CHPR, ZHPR, SSPR2, DSPR2, CHPR2, ZHPR2, xGBMV, CHBMV, ZHBMV, SSBMV, DSBMV, xTBMV, xTBSV * Samples have been added for the new functions, but are not fully tested * This release tested using the 8.951 runtime driver and the 2.6 APPSDK * Note that documentation is incomplete for the new functions Known Issues: * The clAmdBlasTune executable has been observed to hang on Windows. If this happens, abort execution of the tune program; it is not required for correct operation of the BLAS routines (as of 8.872). * clAmdBlas can return invalid results on CPU devices that support AVX (as of 8.951). CPU devices that support up to SSE3 are unaffected. The CPU device is primarily a test/debug device, and GPU devices are unaffected. * clAmdBlas can return invalid results for double precision functions (dsyr, dsyr2, dgemv, dsyrk, dsyr2k, zsyr2k) on Trinity platforms (as of 8.951). * clAmdBlas can return invalid results (ssyr, ssyr2, strsv, ctrsv, ssyrk, ssyr2k, ztrmm) on Southern Island GPU devices (as of 8.951). ____________ Version 1.6: New: * New Level 3 routines added (an 'x' implies all 4 precisions) CSYRK, ZSYRK, CSYR2K, ZSYR2K, CHEMM, ZHEMM, CHERK, ZHERK, xSYMM * New Level 2 routines added (an 'x' implies all 4 precisions) CGEMV, ZGEMV, xTRMV, xTRSV, CHEMV, ZHEMV, SGER, DGER, CGERU, ZGERU, CGERC, ZGERC, CHER, ZHER, CHER2, ZHER2, SSYR, DSYR, SSYR2, DSYR2 * For all the original functions prior to 1.6, a new API has been introduced with an *Ex suffix. These extended API's add new parameters that allow users to specify an offset to a matrix argument. This allows efficient sub-matrix indexing within a clBLAS routine without requiring expensive sub-matrix copy operations. * Samples have been added for the new functions * Preview: Support for AMD Radeon™ HD7000 series GPUs * This release tested using the 8.92 runtime driver and the 2.6 APP SDK Known Issues: * The clAmdBlasTune executable has been observed to hang on Windows. If this happens, abort execution of the tune program; it is not required for correct operation of the BLAS routines (as of 8.872). * The CPU device for clAmdBlas is not functioning for this release (as of 8.872). The CPU device is primarily a test/debug device, and GPU devices are unaffected. ____________ Version 1.4: New: * New Level 3 routines added SSYRK, DSYRK, SSYR2K, DSYR2K * New Level 2 routines added SGEMV, DGEMV, SSYMV, DSYMV * The image support functions (clAmdBlasAddScratchImage, clAmdBlasRemoveScratchImage) have been deprecated. Images are no longer required for the highest performance. * InstallShield is now used for APPML libraries. The default install location has changed from c:\amd\clAmdBlas to C:\Program Files (x86)\AMD\clAmdBlas. It is recommended that previous versions of clAmdBlas are uninstalled first. * Samples have been added for the new functions * This release tested using the 8.872 runtime driver and the 2.5 APP SDK Known Issues: * The clAmdBlasTune executable has been observed to hang on Windows. If this happens, abort execution of the tune program; it is not required for correct operation of the BLAS routines (as of 8.872). * The CPU device for clAmdBlas is not functioning for this release (as of 8.872). The CPU device is primarily a test/debug device, and GPU devices are unaffected. ____________ Version 1.2: * The library now supports both 32- and 64-bit Windows and Linux operating systems. * xTRSM routines are available in 1.2. * clAmdBlas routines return clAmdBlasStatus error codes, instead of native OpenCL error codes Fixed: * xTRMM routines were not properly handling implicit unit diagonal elements and implicit off-diagonal zero values specified by the BLAS parameters SIDE, UPLO and DIAG. * Possible crash with CPU device on 32-bit systems. * clAmdBlasDgemm routine return an invalid event as its last argument. * clAmdBlas routines return clAmdBlasStatus error codes, instead of native OpenCL error codes. Known Issues: * The clAmdBlasTune executable has been observed to hang on Windows. If this happens, abort execution of the tune program; it is not required for correct operation of the BLAS routines (as of 8.872). * The CPU device for clAmdBlas is not functioning for this release (as of 8.872). The CPU device is primarily a test/debug device, and GPU devices are unaffected. ____________________ Version 1.0: * Initial release Known Issues: * Available only on Linux64. * xTRMM routines were not properly handling implicit unit diagonal elements and implicit off-diagonal zero values specified by the BLAS parameters SIDE, UPLO and DIAG * clAmdBlasDgemm returned an invalid event as its last argument _____________ Building the Samples: To install the Linux versions of clAmdBlas, uncompress the initial download, then execute the install script. For example: tar -xf clAmdBlas-${version}-Linux.tar.gz - This installs three files into the local directory, one being an executable bash script. sudo mkdir /opt/clAmdBlas-${version} - This pre-creates the install directory with proper permissions in /opt if it is to be installed there. (This is the default.) ./install-clAmdBlas-${version}.sh - This prints an EULA and uncompresses files into the chosen install directory. cd ${installDir}/bin64 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${OpenCLLibDir}:${clAmdBlasLibDir} - Be sure to export library dependencies to resolve all external linkages to the client program; you can create a bash script to help automate this procedure. ./example_sgemm - Run a simple client; one example is provided for each supported main BLAS function family. The sample program does not ship with native build files; instead, a CMake file is shipped, and the user generates a native build file for their system. For example: cd ${installDir} mkdir samplesBin/ - This creates a sister directory to the samples directory that houses the native makefiles and the generated files from the build. cd samplesBin/ ccmake ../samples/ - ccmake is a curses-based cmake program; it takes a parameter that specifies the location of the source code to compile. - Hit 'c' to configure for the platform; ensure that the dependencies to external libraries are satisfied, including paths to 'ATI Stream SDK'. - After dependencies are satisfied, hit 'c' again to finalize configuration. Then, hit 'g' to generate a makefile and exit ccmake. make help - Look at the options available for make. make - Build the sample client program. ./example_sgemm - Run a simple client; one example is provided for each supported main BLAS function family. _______________________________________________________________________________ (C) 2010,2011 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, ATI, the ATI logo, Radeon, FireStream, FireGL, Catalyst, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Microsoft (R), Windows, and Windows Vista (R) are registered trademarks of Microsoft Corporation in the U.S. and/or other jurisdictions. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos. Other names are for informational purposes only and may be trademarks of their respective owners. The contents of this document are provided in connection with Advanced Micro Devices, Inc. ("AMD") products. AMD makes no representations or warranties with respect to the accuracy or completeness of the contents of this publication and reserves the right to make changes to specifications and product descriptions at any time without notice. The information contained herein may be of a preliminary or advance nature and is subject to change without notice. No license, whether express, implied, arising by estoppel or otherwise, to any intellectual property rights is granted by this publication. Except as set forth in AMD's Standard Terms and Conditions of Sale, AMD assumes no liability whatsoever, and disclaims any express or implied warranty, relating to its products including, but not limited to, the implied warranty of merchantability, fitness for a particular purpose, or infringement of any intellectual property right. AMD's products are not designed, intended, authorized or warranted for use as components in systems intended for surgical implant into the body, or in other applications intended to support or sustain life, or in any other application in which the failure of AMD's product could create a situation where personal injury, death, or severe property or environmental damage may occur. AMD reserves the right to discontinue or make changes to its products at any time without notice. _______________________________________________________________________________