Close

Presentation

GATSched: Multi-Objective Graph Attention Networks for Energy-Efficient HPC Job Scheduling
DescriptionHigh performance computing (HPC) systems face an urgent sustainability crisis, with leading facilities consuming 10–60 MW and incurring multimillion-dollar annual energy costs. Traditional schedulers like SLURM and PBS treat energy as secondary, leading to 30%–50% energy waste above theoretical optimal levels. We present GATSched, a multi-objective graph attention network scheduler that models HPC workloads as dynamic graphs with specialized attention heads. Our approach jointly optimizes energy efficiency, performance, and resource utilization using four attention mechanisms: energy, performance, balance, and temporal. Through trace-driven simulation validation on 389,604 production jobs across three HPC architectures, GATSched achieves 27%–35% energy reduction while maintaining substantial resource utilization. In the poster session, we will demonstrate the GAT architecture and benchmark comparisons through interactive visualizations.