Presentation
Implementing Network-level QoS at HPC Datacenters to Enable Distributed Scientific Workflows
SessionThe 12th Annual International Workshop on Innovating the Network for Data-Intensive Science (INDIS)
DescriptionHigh-performance computing (HPC) datacenters must simultaneously support real-time data streams with sub-millisecond latency and bulk transfers requiring sustained multi-gigabit throughput—demands that compete for the same network resources. End-to-end performance guarantees are therefore essential, typically delivered through Quality of Service (QoS) mechanisms that classify traffic, reserve bandwidth, and enforce priorities across all network hops. While backbone and wide-area network providers already implement QoS, the local Ethernet ingress “last-mile” inside HPC facilities generally remains best-effort, creating a critical blind spot where latency builds and time-sensitive workflows can suffer. We address this gap with a standards-based Differentiated Services Code Point (DSCP) QoS configuration on existing leaf–spine switches: packets are marked at the host, queued per traffic class, and shaped on every hop through to the high-speed network (HSN) gateway NIC. Experiments on both intra-domain and inter-domain traffic show up to 60 percent more stable throughput and 30 percent fewer retransmissions, without hardware upgrades.
Event Type
Workshop
TimeSunday, 16 November 20254:35pm - 4:50pm CST
Location266
Similar Presentations


