Presentation
TurboFNO: High-Performance Fourier Neural Operator with Fused FFT-GEMM-iFFT on GPU
SessionMachine Learning: Methods
DescriptionFourier neural operators (FNOs) are widely used for learning partial differential equation solution operators. However, FNOs lack architecture-aware optimizations, with their Fourier layers executing FFT, filtering, GEMM, zero padding, and iFFT as separate stages, incurring multiple kernel launches and significant global memory traffic. We propose TurboFNO, the first fully fused FFT-GEMM-iFFT GPU kernel with built-in FFT optimizations. We first develop FFT and GEMM kernels from scratch, achieving performance comparable to cuBLAS and cuFFT. Additionally, our FFT integrates a built-in high-frequency truncation, input zero-padding, and pruning feature to avoid additional memory copy kernels. To fuse FFT and GEMM, we propose an FFT variant where a threadblock iterates over hidden dimension to align with GEMM’s $k$-loop, along with two shared memory swizzling patterns that ensure 100\% bank utilization when forwarding FFT output to GEMM and retrieving results for iFFT. Experimental results show TurboFNO outperforms PyTorch, cuBLAS, and cuFFT by up to 150\%.
Event Type
Paper
TimeWednesday, 19 November 202511:37am - 12:00pm CST
Location261-262-265-266
Architectures & Networks
Similar Presentations

