div.pre { overflow: auto; } pre { background-color: #FFFACD; padding: 5px; display: inline-block; margin-left: 10px; } blockquote { background-color: #EEFFEE; } div.side-comment { background-color: #EEEEFF; padding: 5px; display: inline-block; margin-left: 10px; font-size: 90%; } table.chart, th.chart, td.chart { border: 1px solid black; border-collapse: collapse; margin-top: 5px; } th.chart, td.chart { padding: 5px; } th.left, td.left { text-align: left; } th.right, td.right { text-align: right; } img.plain { padding: 0; border: none; background: none; -moz-box-shadow: 0px 0px 0px transparent; -webkit-box-shadow: 0px 0px 0px transparent; box-shadow: 0px 0px 0px transparent; } This post discusses the use of AMD's OpenCL BLAS library to perform matrix multiplication on a GPU.The library is accessed via a custom R package that allows for use and comparison of various BLAS libraries. The R package is implemented for OS X 10.10 and CentOS 6.6.Timing tests are included for an AMD HD 5770 GPU installed in a 2010 model Mac Pro (2.8 GHz), and for an AMD R9 280 CPU installed in a custom system with an i7-4790k CPU (4.0 GHz).Installation of clBLAS on OS XThe following assumes Xcode 6.4 has already been installed.Make a directory for the build files.cd ~mkdir clblascd clblasDownload and expand clBLAS source code archive. (Check here to verify the version number of the latest release.)curl "https://codeload.github.com/clMathLibraries/clBLAS/tar.gz/v2.4" -o "archive.tar.gz"tar zxvf archive.tar.gzThis code uses the CMake system to build the library. Download the Mac OS X binary. (Check here to verify the version number of the latest release.)curl "http://www.cmake.org/files/v3.3/cmake-3.3.0-rc3-Darwin-x86_64.dmg" -o "cmake.dmg"Open the cmake.dmg file and drag the CMake application to the Applications folder.Open the Applications folder and double-click CMake.Click the Browse Source... button and choose the clblas/clBLAS-2.4/src folder.Select the text in the box to the right of Where is the source code: and paste it into the box labeled Where to build the binaries, and then change clBLAS-2.4/src to clBLAS-2.4/binClick the Configure button.When asked, "Build directory does not exist, should I create it?" click Yes.When asked, "Specify the generator for this project," select XCode.Make sure "Use default native compilers" is selected.Click the Done button.CMake will think for a while and then show these settings in the main window:Uncheck the following boxes:BUILD_KTESTBUILD_TESTCORR_TEST_WITH_ACMLMake sure these (and only these) boxes are still checked:BLAS_PRINT_BUILD_ERRORSBUILD_RUNTIMEBUILD_SHARED_LIBSEdit the following text values:Set CMAKE_INSTALL_PREFIX to /usr/local/clblasSet CMAKE_BUILD_TYPE to ReleaseOne additional parameter is needed for the build to work. Click the Add Entry button:Fill in the dialog as follows:Thanks to Marcus Zy's blog post for above hint, which allowed for CMake to handle the dylib install-name problem automatically, instead of rassling with install_name_tool e.g., as described here.Click the Configure button. When CMake is done, the settings background should no longer be red.The following configuration warnings may appear and can be ignored:The clBLAS ktest requires boost to be installedThe clBLAS client requires boost to be installedMACOSX_RPATH is not specified...Click the Generate button. Quit CMake.I first tried using the Unix makefiles generator instead of Xcode. It didn't work out. I suspect this was because a dylib has knowledge of the directory it is to be located, and this information was not set correctly by CMake.To experiment with different CMake choices, the safest route is:Trash the clBLAS-2.4 directory.Re-expand the source code archive.Re-launch CMake and choose Delete Cache from the File menu.Next, create a folder for the shared library, and change the ownership to your user name.sudo mkdir -p /usr/local/clblassudo chown michael /usr/local/clblasIn the Finder, navigate to the folder clblas/clBLAS-2.4/bin; launch Xcode by double-clicking clBLAS.xcodeproj.Set the popup in the upper left corner to install.Then, use the same popup to select Edit Scheme....Set the value for Run: Info: Build Configuration to Release.In the Product menu, choose Build. Expect to to see numerous warnings.Verify that the library has been installed.ls /usr/local/clblas/libVerify that the library has the correct installation location.otool -L /usr/local/clblas/lib/libclBLAS.2.4.0.dylibThe first two lines of otool output should look like this:/usr/local/clblas/lib/libclBLAS.2.4.0.dylib:/usr/local/clblas/lib/libclBLAS.2.dylib (compatibility version 2.0.0, current version 2.4.0)Installation of clBLAS on CentOS 6The following assumes an appropriate AMD GPU and OpenCL SDK have already been installed; see Personal Linux R Server: Part IV.Download, expand, and copy the clBLAS library binary. (Check here to verify the version number of the latest release.)cd ~mkdir clblascd ~/clblaswget https://github.com/clMathLibraries/clBLAS/releases/download/v2.4/clBLAS-2.4.0-Linux-x64.tar.gztar -zxvf clBLAS-2.4.0-Linux-x64.tar.gz sudo mv clBLAS-2.4.0-Linux-x64 /optCreate a file to add clblas to the library search path:sudo sh -c 'echo "/opt/clBLAS-2.4.0-Linux-x64/lib64" >> /etc/ld.so.conf.d/clBLAS-2.4.0-Linux-x64.conf'Rebuild links and cache for shared libraries.sudo ldconfigInstallation of multiblas R Package on OS XThe following assumes clBLAS has been installed as described above.Download the multiblas package.cd ~/Downloadscurl -OL https://github.com/quadrivio/multiblas/releases/download/0.90-1/multiblas_0.90-1.tar.gzLaunch R and install package.setwd("~/Downloads")Sys.setenv(PKG_LIBS = "-framework OpenCL -L/usr/local/clblas/lib/ -lclBLAS")Sys.setenv(PKG_CPPFLAGS = "-I/usr/local/clblas/include")install.packages("multiblas_0.90-1.tar.gz", repos = NULL, type = "source")library(multiblas)Installation of multiblas R Package on Centos 6The following assumes clBLAS has been installed as described above.Download the multiblas package.cd ~/Downloadscurl -OL https://github.com/quadrivio/multiblas/releases/download/0.90-1/multiblas_0.90-1.tar.gzLaunch R and install package.setwd("~/Documents")Sys.setenv(PKG_CPPFLAGS = "-I/usr/lib64/openblas/include -I/opt/AMDAPPSDK-3.0-0-Beta/include -Dnullptr='NULL' -I/opt/clBLAS-2.4.0-Linux-x64/include")Sys.setenv(PKG_LIBS = "$(BLAS_LIBS) -L/opt/AMDAPPSDK-3.0-0-Beta/lib/x86_64 -lOpenCL -L/opt/clBLAS-2.4.0-Linux-x64/lib64 -lclBLAS")install.packages("multiblas_0.90-1.tar.gz", repos = NULL, type = "source")library(multiblas)Usage of multiblasThe purpose of this package is to make it easy to compare and use alternate implementations of BLAS matrix-multiplication functions.The function blas.lib() creates an object containing a set of functions for a particular implementation. For example, the following creates a library using functions implemented in the R language itself:base <- blas.lib(type="Base")The following creates a library using the FORTRAN API to the BLAS library in use by R. (This is always available, but supplies double-precision functions only.)lib <- blas.lib(type="R")The following creates a library using the C API to the BLAS library in use by R. (This is always available for double-precision functions; it may or may not be available for single-precision functions, depending on the particular BLAS library in use.)lib <- blas.lib(type="C")The following creates a library using calls to a naive C implementation (i.e., slow but easy-to-understand).lib <- blas.lib(type="naive")The following creates a library using calls to the clBLAS library using the default GPU device.lib <- blas.lib(type="clblas", processor="GPU")Functions are available as attributes of the library object. If lib is the object produced by any of the above calls to blas.lib(), then the functions may be used as follows:lib$crossprod(a)# equivalent to R code:# crossprod function# function(x) {# crossprod(x)# }#lib$gemm(A, B, transposeA, transposeB, C, alpha, beta)# equivalent to R code:# gemm: general matrix-multiplication function# function(A, B, transposeA = FALSE, transposeB = FALSE, C = NA, alpha = 1.0, beta = 0.0) {# if (transposeA) A <- t(A)# if (transposeB) B <- t(B)# if (is.na(C[1])) C <- 0.0# # result <- alpha * A %*% B + beta * C# return(result)# }This allows for convenient testing of the performance and accuracy of different implementations, by placing test code in a function that takes a library object as input. The test function code can be written once, and then called with different libraries.ImplementationAs of now, only the crossprod() and gemm() functions are implemented. The gemm() function is the most general matrix-multiplication function in the BLAS library, and thus serves as a useful example. The crossprod() function appears in a commonly-used R performance benchmark, and thus it may be useful in comparing standard results with results from optimized implementations.The library crossprod() function is only implemented for the special case where one argument is supplied. The built-in R function takes one or two arguments and has behavior equivalent to the following:crossprod - function(x, y = NULL) { if (is.null(y) { return(t(x) %*% x) } else { return(t(x) %*% y) }}When only one argument is supplied, the result is a symmetric matrix, which allows for the use of the optimized syrk BLAS function.Timing ResultsConfigurationThe test systems' hardware and software configurations are described in more detail in previous posts:Personal Linux R Server in a Mini-ITX Gaming Case (Part I)Personal Linux R Server (Part II: Software Setup)Personal Linux R Server (Part III: Leveling Up)Personal Linux R Server (Part IV: Final Upgrades)Improved R Performance with OpenBLAS and vecLib LibrariesGFLOPS calculationFor general matrix multiplication, the resulting matrix contains ncolB x nrowA values, and each value is the result of the dot product of two vectors