Close

Presentation

C++ Standard Parallelism for GPU Programming in a Particle-In-Cell Application
DescriptionPerformance portability remains a major challenge in high performance computing as applications increasingly target diverse GPU architectures. The C++17 standard introduced stdpar, a high-level parallelism model to simplify parallel programming. NVIDIA extended this model for GPU execution within heterogeneous architectures, followed by an AMD implementation.

We evaluate stdpar for a classical Particle-In-Cell (PIC) method on recent NVIDIA and AMD GPUs, comparing it to Thrust, Kokkos, and SYCL in runtime performance and programming productivity. The PIC implementation is dominated by a projection operator that heavily uses atomic operations. Our analysis covers both overall loop performance and the projection kernel. On NVIDIA GPUs, stdpar processes 1.7× fewer particles than Kokkos and 1.1× fewer than Thrust under equivalent conditions, despite productivity benefits.

This work is ongoing, with further tuning planned. At the poster session, results will be presented with performance charts, kernel breakdowns, and code snippets to illustrate these trade-offs.