Close

Presentation

High-Performance Python for Pixelated LArTPC Simulation: Scale on NERSC (Perlmutter) and TACC (Vista)
DescriptionWe present a Python-native, GPU-accelerated LArTPC simulation (larnd-sim) built with Numba and CuPy and scaled on NERSC Perlmutter (AMD-Milan + A100) and TACC Vista (Arm64 + GH200). Guided by Nsight Systems/Compute and profiling, we reshape data (jagged-arrays, sub-batching), reduce allocations and transfers via buffer reuse, and tune kernels (grid/block, register ceilings). A targeted refactor replaces Python loops with vectorized bulk operations and moves function evaluations out of kernels to precomputed lookups, cutting CPU overhead and GPU math. Runs show >50% peak-memory cuts and >1.5x speedups, retained at scale. These profiling techniques and optimization strategies generalize to other accelerated Python workloads.