Close

Presentation

This content is available for: Technical Program Reg Pass, Workshop Reg Pass. Upgrade Registration
Accelerating Exascale Scientific Discovery via In-Situ and In-Transit Data Analytics in HPC
DescriptionThe rapid growth of multimodal data from large-scale simulations and experimental instruments is overwhelming traditional storage and analysis workflows. Post hoc, disk-based methods suffer from latency, bandwidth bottlenecks, and inefficient resource use, slowing scientific insight. This work explores a hybrid in-situ and in-transit framework that embeds computation within the memory and storage hierarchy of HPC systems. In-situ processing performs filtering, reduction, or analysis directly at the data source using node-local memory and accelerators. In-transit processing complements this by leveraging intermediate layers such as burst buffers or dedicated resources for asynchronous analytics, balancing simulation and analysis.
Our architecture integrates Apache Ignite’s in-memory data grid with Apache Spark’s distributed computing and containerized microservices to enable real-time ingestion, fusion, and ML-driven analysis. Our preliminary results show reduced latency, efficient CPU–memory utilization, and strong scalability. Case studies on NWChem molecular dynamics and E3SM climate simulations demonstrate adaptability across domains, advancing data-aware, exascale-class discovery.