Presentation
Learning to Schedule: A Supervised Learning Framework for Network-Aware Scheduling of Data-Intensive Workloads
SessionThe 12th Annual International Workshop on Innovating the Network for Data-Intensive Science (INDIS)
DescriptionDistributed cloud environments hosting data-intensive applications often experience slowdowns from network congestion, asymmetric bandwidth, and inter-node data shuffling. These factors are typically not captured by host-level metrics such as CPU or memory. Scheduling without considering them can cause poor placement, longer transfers, and degraded job performance. We present a network-aware scheduler that uses supervised learning to predict job completion time. Our system collects real-time telemetry from all nodes, applies a trained model to estimate job duration per node, and ranks them to select the best placement. We evaluate the scheduler on a geo-distributed Kubernetes cluster deployed on the FABRIC testbed using network-intensive Spark workloads. Compared to the default Kubernetes scheduler, which relies on current resource availability alone, our supervised scheduler achieved 34–54% higher accuracy in selecting optimal nodes. The novelty of our work lies in demonstrating supervised learning for real-time, network-aware scheduling on a multi-site cluster.
Event Type
Workshop
TimeSunday, 16 November 202512:20pm - 12:30pm CST
Location266
short paper
