Close

Presentation

Local vs. Global FFT Approaches for High-Performance Ultrasound Simulation on Multi-GPU Systems
DescriptionSimulating wave propagation with the Fourier collocation method is computationally intensive due to its reliance on discrete Fourier transforms (DFTs). While DFTs enable near-minimal spatial discretization, they scale poorly on modern high performance computing systems. This work evaluates two multi-GPU strategies for three-dimensional simulations: a Global FFT approach using distributed transforms, and a Local FFT approach based on domain decomposition with halo exchanges. Experiments were performed on a system with eight NVIDIA A100 GPUs connected via NVSwitch. Precision tests show that the Local FFT approach maintains errors around 0.1% when the halo covers the local PML region. Performance results demonstrate that the Local FFT approach achieves lower runtimes and significantly reduced communication overhead compared to the Global FFT approach, particularly for larger domains. These findings indicate that Local FFT decomposition is a promising strategy for scalable, large-scale multi-node ultrasound simulations.