Presentation
Heterogeneity-Aware Task Allocation for Modern HPC Systems
DescriptionModern supercomputing systems exhibit heterogeneous node configurations, where seemingly identical hardware exhibits significant performance variations due to memory capacity differences, manufacturing tolerances, and deployment conditions. This heterogeneity impacts the efficiency of scientific applications built on frameworks like AMReX, leading to substantial computational waste on leadership-class systems. We present performance-aware and relation-aware load balancing algorithms specifically designed for scientific applications, like AMReX on heterogeneous HPC clusters. Our approach uses empirically measured node performance characteristics and a relative performance matrix to optimize task distribution across diverse computational resources.
Evaluation of NERSC Perlmutter with 14 representative AMReX computational kernels demonstrates 99.9% scheduling efficiency, achieving performance improvements of 4.4%-11.5% over traditional methods in moderate heterogeneity scenarios (A100 40GB vs. 80GB) and up to 300x improvements in extreme CPU-GPU mixed configurations where homogeneous methods fail to utilize CPU resources effectively. The algorithms handle million-task workloads with O(nlogn + nm) complexity while maintaining practical deployment feasibility.
Evaluation of NERSC Perlmutter with 14 representative AMReX computational kernels demonstrates 99.9% scheduling efficiency, achieving performance improvements of 4.4%-11.5% over traditional methods in moderate heterogeneity scenarios (A100 40GB vs. 80GB) and up to 300x improvements in extreme CPU-GPU mixed configurations where homogeneous methods fail to utilize CPU resources effectively. The algorithms handle million-task workloads with O(nlogn + nm) complexity while maintaining practical deployment feasibility.

Event Type
Research and ACM SRC Posters
TimeTuesday, 18 November 20258:00am - 5:00pm CST
LocationSecond Floor Atrium
Archive
view


