Close

Presentation

EMLIO: Minimizing I/O Latency and Energy Consumption for Large-Scale AI Training
DescriptionLarge-scale deep learning workloads increasingly face I/O bottlenecks as datasets exceed local storage and GPU compute outpaces network and disk speeds. While recent systems optimize data-loading time, they often ignore I/O energy costs—a critical factor at scale. We present EMLIO, an Efficient Machine Learning I/O service that minimizes both end-to-end data-loading latency (𝑇) and I/O energy consumption (𝐸) across variable-latency networked storage. EMLIO uses a lightweight data-serving daemon on storage nodes to serialize and batch raw samples, stream them over TCP with out-of-order prefetching, and integrate with GPU-accelerated (NVIDIA DALI) preprocessing on the client side. In evaluations over local disk, LAN (0.05 ms & 10 ms RTT), and WAN (30 ms RTT), EMLIO achieves up to 8.6× faster I/O and 10.9× lower energy use than state-of-the-art loaders, maintaining constant performance and energy profiles across distances. Its service-based architecture offers a scalable blueprint for energy-aware I/O in next-generation AI clouds.