Presentation
SIGN IN TO VIEW THIS PRESENTATION Sign In
Performance portable batched linear algebra kernels for transport sweeps using Kokkos
DescriptionThis paper describes the development of performance portable
batched linear algebra kernels for SN-DG neutron transport
sweeps using Kokkos. We establish a new sweep algorithm
for GPUs that relies on batched linear algebra kernels. We
implement an optimized batched gesv solver for small linear
systems that builds upon state-of-the-art algorithms. Our
implementation achieves high performance by minimizing
global memory traffic and maximizing the amount of compu-
tations done at compile-time. We assess the performance of
the batched gesv kernel on NVIDIA and AMD GPUs. We
show that our custom implementation outperforms state-of-
the-art linear algebra libraries on these architectures. The
performance of the new GPU sweep implementation is as-
sessed on the H100 and MI300A GPUs. We demonstrate that
our GPU implementation is able to achieve high performance
on both architectures, and is competitive with an optimized
multithreaded CPU implementation on a 128-core CPU.
batched linear algebra kernels for SN-DG neutron transport
sweeps using Kokkos. We establish a new sweep algorithm
for GPUs that relies on batched linear algebra kernels. We
implement an optimized batched gesv solver for small linear
systems that builds upon state-of-the-art algorithms. Our
implementation achieves high performance by minimizing
global memory traffic and maximizing the amount of compu-
tations done at compile-time. We assess the performance of
the batched gesv kernel on NVIDIA and AMD GPUs. We
show that our custom implementation outperforms state-of-
the-art linear algebra libraries on these architectures. The
performance of the new GPU sweep implementation is as-
sessed on the H100 and MI300A GPUs. We demonstrate that
our GPU implementation is able to achieve high performance
on both architectures, and is competitive with an optimized
multithreaded CPU implementation on a 128-core CPU.
