Close

Presentation

System and Software Testing for Post-Exascale HPC: Challenges and Opportunities
DescriptionHigh performance computing (HPC) and its application software face unique testing challenges due to extreme concurrency, complex systems architecture, and variety of correctness requirements associated with floating point numbers. Ensuring correctness, performance stability, and resilience at scale requires testing not only application codes and libraries but also system software and hardware-software interactions. Despite its critical importance, HPC testing methodologies often lag behind the rapid growth in system and application complexity. This panel brings together experts from system software, programming models, numerical libraries, and large-scale applications to discuss evolving challenges and emerging opportunities in HPC testing. Panelists will share lessons from real-world failures, explore new approaches—including formal verification, large-scale fault injection, and property-based testing—and debate whether testing should become a first-class research and operational priority for future HPC systems. Audience engagement will be encouraged to collaboratively envision how testing practices must evolve to meet the demands of exascale computing and beyond.