Presentation
SIGN IN TO VIEW THIS PRESENTATION Sign In
Aurora Acceptance: A Collaborative Exascale Test Harness
DescriptionThe Aurora exascale system is the latest supercomputer deployed at the Argonne Leadership Computing Facility (ALCF). Successfully deploying a leadership class system is the result of years of effort both by the facility and the vendor. This extensive collaboration culminates with the successful completion of acceptance testing, a necessary step to prepare the system for general access, ensuring that the system is stable, accurate, and performant for scientific discovery.
The Aurora acceptance test process mimicked the real world utilization of the system, stressed the entire system as well as the individual components, and tracked the regressions that occurred. The open-source based acceptance test harness of the previously deployed ALCF system was extended for Aurora. This work describes this harness, its components, and its extensions. In addition, we discuss our experiences expanding the harness to support additional testing modes while highlighting the challenges encountered, lessons learned, and desires for future enhancement.
The Aurora acceptance test process mimicked the real world utilization of the system, stressed the entire system as well as the individual components, and tracked the regressions that occurred. The open-source based acceptance test harness of the previously deployed ALCF system was extended for Aurora. This work describes this harness, its components, and its extensions. In addition, we discuss our experiences expanding the harness to support additional testing modes while highlighting the challenges encountered, lessons learned, and desires for future enhancement.


