Presentation
Run-time Energy-Efficiency Optimization for AI and HPC Workloads
SessionSustainable Supercomputing
DescriptionEffective power management is crucial for balancing high performance and environmental impact in the exascale era, particularly for datacenters dominated by massively parallel GPU systems due to the rise of AI. While many strategies rely on deep application knowledge, there is a growing need for application-agnostic approaches. We introduce a node-level power management runtime designed for regular applications, featuring minimal overhead and seamless deployment across any HPC/AI system. Our approach detects, at runtime, repetitive execution patterns via spectral analysis and then traces per-pattern energy consumption. A simple gradient-descent optimizer gradually adjusts the GPU frequency until the least per-pattern energy (i.e., maximum energy efficiency) is found. With this approach, we demonstrate up to a 15% reduction in energy consumption for equivalent computational tasks, with no overhead and minimal impact on execution time. This solution has been validated across a diverse range of AI applications, and we discuss the resulting energy savings.
Event Type
Workshop
TimeSunday, 16 November 20252:45pm - 3:00pm CST
Location264
