Close

Presentation

CGSim: A Simulation Framework for Large Scale Distributed Computing Environment
DescriptionLarge-scale distributed computing infrastructures like the Worldwide LHC Computing Grid (WLCG) require comprehensive simulation tools for performance evaluation and resource optimization. Existing simulators suffer from limited scalability, hardwired algorithms, lack of real-time monitoring, and inability to generate machine learning-suitable datasets.We present CGSim, a simulation framework addressing these limitations. Built on the validated SimGrid framework, CGSim provides high-level abstractions for modeling heterogeneous grid environments while maintaining accuracy and scalability. Key features include a modular plugin mechanism for testing custom workflow policies, interactive real-time visualization dashboards, and automatic generation of event-level datasets for AI-assisted performance modeling. Comprehensive evaluation using production ATLAS PanDA workloads demonstrates significant calibration accuracy improvements across WLCG sites. Scalability experiments show near-linear scaling for multi-site simulations, with distributed workloads achieving 6× better performance than single-site execution. CGSim enables researchers to simulate WLCG-scale infrastructures with hundreds of sites and thousands of concurrent jobs on commodity hardware within practical time budgets.