Presentation
Sync-Free GPU Parallelization of Sparse Kernels from Sequential Python Code
DescriptionSparse matrix kernels such as SpMV, SpTRSV, and Gauss-Seidel are critical in scientific computing, AI, and engineering, but they remain difficult to parallelize due to irregular memory access patterns. Traditional compiler techniques assume affine array accesses, which do not hold in sparse formats like CSR and CSC. As a result, existing compilers often leave sparse code under-optimized, missing significant opportunities for parallelism.
We present a sync-free, runtime-based transformation that automates loop parallelization for sparse kernels with loop-carried dependencies. Our approach traces memory reads and writes to construct dependence sets, then generates Triton kernels that use flag arrays to enforce correctness without global synchronization. This method generalizes across sparse kernels by leveraging properties such as associativity and affine simplifications, enabling efficient parallel execution.
We demonstrate our work with sparse triangular solves and related kernels, and will present performance results, methodology, and case studies in the poster session.
We present a sync-free, runtime-based transformation that automates loop parallelization for sparse kernels with loop-carried dependencies. Our approach traces memory reads and writes to construct dependence sets, then generates Triton kernels that use flag arrays to enforce correctness without global synchronization. This method generalizes across sparse kernels by leveraging properties such as associativity and affine simplifications, enabling efficient parallel execution.
We demonstrate our work with sparse triangular solves and related kernels, and will present performance results, methodology, and case studies in the poster session.

Event Type
Research and ACM SRC Posters
TimeTuesday, 18 November 20258:00am - 5:00pm CST
LocationSecond Floor Atrium
Archive
view
