Presentation
SIGN IN TO VIEW THIS PRESENTATION Sign In
Bridging the Gap Between Unstructured SpMM and Structured Sparse Tensor Cores
DescriptionThe acceleration of Sparse-dense Matrix Multiplication (SpMM) using Tensor Cores (TCs) in GPUs has recently garnered significant attention. TCs are designed for block-wise matrix multiplication; however, block partitioning of general unstructured sparse matrices often results in low-level density, causing a substantial waste of computational resources. Sparse Tensor Cores (SpTCs) can mitigate this issue by skipping 50% of zero values; however, SpTCs are limited to strict 2:4 or 1:2 structured sparsity. To bridge this gap, we propose MP-SpMM, a novel matching and padding approach that transforms general sparse matrices into structured sparsity, drawing inspiration from the maximum matching problem in graph theory. Moreover, we introduce a novel storage format and a highly optimized GPU kernel that fully exploits the capabilities of SpTCs. Extensive experiments on modern GPUs demonstrate that MP-SpMM outperforms state-of-the-art SpMM libraries, DTC-SpMM and RoDe, with an average speedup of 2.42x (up to 7.65x) and 1.92x (up to 8.60x).
Event Type
Paper
TimeTuesday, 18 November 20254:37pm - 5:00pm CST
Location263-264
HPC for Machine Learning

