Presentation
Scaling Singular Values Beyond GPU Memory Limits: Out-of-Core, GPU-Accelerated, and Unified Across Data Precision and Hardware
DescriptionWe present a unified, out-of-core, GPU-accelerated singular value solver that achieves performance portability across diverse hardware platforms and data precisions for datasets exceeding GPU memory. The singular value decomposition (SVD) is fundamental for processing large-scale datasets, yet the diversity of computing architectures and the proliferation of precision formats pose significant challenges in heterogeneous environments. Traditional HPC libraries require separate implementations for each architecture and precision, limiting scalability and usability. Building on our previous work, where we developed an open-source unified solver achieving performance comparable to vendor-optimized libraries across multiple precisions and GPU platforms, we extend this capability to handle larger-than-memory datasets. We adapt a QR-based communication-hiding strategy to improve the compute-to-communication ratio and leverage Julia's multiple-dispatch for seamless backend integration. Our implementation significantly outperforms CPU-based LAPACK and remains only 3–5× slower than GPU-resident solvers across different hardware and data precision configurations.

Event Type
Best Poster Presentations (Research, ACM SRC Grad/Undergrad)
TimeWednesday, 19 November 20253:30pm - 3:45pm CST
Location230
Archive
view

