Presentation
SIGN IN TO VIEW THIS PRESENTATION Sign In
Breaking the System Noise Barrier at Exascale
SessionState of the Practice
DescriptionTo meet the increasing demands of parallel scientific applications, supercomputers continue to grow in both scale and complexity. The fastest supercomputer in the world, El Capitan, features over a million CPU cores and tens of thousands of GPUs. Applications running on such large-scale systems are particularly susceptible to system noise or interference caused by the operating system (OS) and other services running on the same compute nodes as the application.
In this paper, we address this critical performance and scalability challenge on El Capitan, enabling scientific applications to better leverage the benefits of the world's fastest supercomputer. Our strategy comprises two key components: (1) isolating system services from applications and (2) applying OS-level tuning to maintain minimal application interference. As part of this effort, we provide a distribution-independent tuning guide applicable to any Linux system, and we propose and evaluate general strategies for isolating system processes.
In this paper, we address this critical performance and scalability challenge on El Capitan, enabling scientific applications to better leverage the benefits of the world's fastest supercomputer. Our strategy comprises two key components: (1) isolating system services from applications and (2) applying OS-level tuning to maintain minimal application interference. As part of this effort, we provide a distribution-independent tuning guide applicable to any Linux system, and we propose and evaluate general strategies for isolating system processes.














