Presentation
ACTINA: Adapting Circuit-Switching Techniques for AI Networking Architectures
DescriptionWhile traditional datacenters rely on static, electrically switched fabrics, Optical Circuit Switch (OCS)-enabled reconfigurable networks offer dynamic bandwidth allocation and lower power consumption. This work introduces a quantitative framework for evaluating reconfigurable networks in large-scale AI systems, guiding the adoption of various OCS and link technologies by analyzing trade-offs in reconfiguration latency, link bandwidth provisioning, and OCS placement. Using this framework, we develop two in-workload reconfiguration strategies and propose an OCS-enabled, multi-dimensional all-to-all topology that supports hybrid parallelism with improved energy efficiency. Our evaluation demonstrates that with state-of-the-art per-GPU bandwidth, the optimal in-workload strategy achieves up to 2.3x improvement over the commonly used one-shot approach when reconfiguration latency is low (<100μs). However, with sufficiently high bandwidth, one-shot reconfiguration can achieve comparable performance without requiring in-workload reconfiguration. Additionally, our proposed topology improves performance–power efficiency, achieving up to 1.75x better trade-offs than Fat-Tree and 3D-Torus–based OCS network architectures.
Event Type
Paper
TimeWednesday, 19 November 20251:52pm - 2:15pm CST
Location260-267
Architectures & Networks
BP


