Presentation
Efficient Distributed GPU Programming for Exascale
DescriptionOver the past decade, GPUs became ubiquitous in HPC installations around the world, delivering the majority of performance of some of the largest supercomputers, steadily increasing the available compute capacity. Finally, four exascale systems are deployed (Frontier, Aurora, El Capitan, JUPITER), using GPUs as the core computing devices for this era of HPC. To take advantage of these GPU-accelerated systems with tens of thousands of devices, application developers need to have the proper skills and tools to understand, manage, and optimize distributed GPU applications. In this tutorial, participants will learn techniques to efficiently program large-scale multi-GPU systems. While programming multiple GPUs with MPI is explained in detail, advanced tuning techniques and complementing programming models like NCCL and NVSHMEM are also presented. Tools for analysis are shown and used to motivate and implement performance optimizations. The tutorial teaches fundamental concepts that apply to GPU-accelerated systems of any vendor in general, taking the NVIDIA platform as an example. This tutorial is a combination of lectures and hands-on exercises, using the JUPITER system for interactive learning and discovery.
Note for Attendees
The first minutes of the tutorial will be dedicated for signing up to the supercomputer used for the tutorial. We welcome early show-ups so we can expedite the process and sign everyone up quickly.
The exact sign-up instructions will be provided in the room, but they will be similar to the material from the previous year available at https://github.com/FZJ-JSC/tutorial-multi-gpu/. If you can spare a minute, check it out in advance.
The exact sign-up instructions will be provided in the room, but they will be similar to the material from the previous year available at https://github.com/FZJ-JSC/tutorial-multi-gpu/. If you can spare a minute, check it out in advance.
Event Type
Tutorial
TimeSunday, 16 November 20258:30am - 5:00pm CST
Location127
Livestreamed
Recorded


