Presentation
EAS-Sim: A Framework and its Methodology for the Co-Design of Multi-Objective, Energy-Aware Schedulers for AI Clusters
SessionSustainable Supercomputing
DescriptionThe explosive growth of large-scale Deep Learning (DL) models has made energy consumption a first-order operational cost and constraint in modern High-Performance Computing (HPC) datacenters. Existing DL schedulers, however, are largely single-objective and energy oblivious, struggling to balance the competing demands of performance, fairness, and Quality of Service (QoS). To address this flaw, we propose a methodology for the co-design of multi-objective and energy-aware schedulers together with the associated simulation framework, the so-called EAS-Sim. Our methodology stands as a systematic approach to enhance State-of-the-Art (SOTA) scheduling heuristics with energy-efficiency objectives.
Using our framework, we design and evaluate four novel and malleable job schedulers. Our flagship energy-aware policy, Zeus, establishes a new Pareto-optimal frontier and reduces total energy consumption by ≈8-10% compared to the SOTA performance scheduler Pollux with no statistically significant loss in system throughput. EAS-Sim is available as open-source on GitHub.
Using our framework, we design and evaluate four novel and malleable job schedulers. Our flagship energy-aware policy, Zeus, establishes a new Pareto-optimal frontier and reduces total energy consumption by ≈8-10% compared to the SOTA performance scheduler Pollux with no statistically significant loss in system throughput. EAS-Sim is available as open-source on GitHub.
Event Type
Workshop
TimeSunday, 16 November 20255:15pm - 5:30pm CST
Location264
