Close

Presentation

High Performance Batch SVD Using GPUs
DescriptionWe consider the problem of computing the singular value decomposition (SVD) of many relatively small matrices using GPUs. This is an essential component in various scientific applications, including computational chemistry, low-rank approximations, and others. Our approach is based on the parallel one-sided Jacobi algorithm, which has a large degree of parallelism, and also heavily relies on compute-bound level-3 BLAS operations, such as matrix multiply. Our approach uses two design strategies. The first one targets very small matrices using a single GPU kernel for the entire SVD operation. The second design strategy uses a blocked version of the parallel Jacobi algorithm, which supports matrices of arbitrary dimensions. The proposed solution supports any matrix shape (square, tall-skinny, or short-wide), requires no limitations on the matrix dimensions, and delivers superior performance against state-of-the-art solutions. This work is set to be released in the MAGMA library.