Presentation
SIGN IN TO VIEW THIS PRESENTATION Sign In
Accelerating Exascale Scientific Discovery via In-Situ and In-Transit Data Analytics in HPC
DescriptionThe rapid growth of multimodal data from large-scale simulations and experimental instruments is overwhelming traditional storage and analysis workflows. Post hoc, disk-based methods suffer from latency, bandwidth bottlenecks, and inefficient resource use, slowing scientific insight. This work explores a hybrid in-situ and in-transit framework that embeds computation within the memory and storage hierarchy of HPC systems. In-situ processing performs filtering, reduction, or analysis directly at the data source using node-local memory and accelerators. In-transit processing complements this by leveraging intermediate layers such as burst buffers or dedicated resources for asynchronous analytics, balancing simulation and analysis.
Our architecture integrates Apache Ignite’s in-memory data grid with Apache Spark’s distributed computing and containerized microservices to enable real-time ingestion, fusion, and ML-driven analysis. Our preliminary results show reduced latency, efficient CPU–memory utilization, and strong scalability. Case studies on NWChem molecular dynamics and E3SM climate simulations demonstrate adaptability across domains, advancing data-aware, exascale-class discovery.
Our architecture integrates Apache Ignite’s in-memory data grid with Apache Spark’s distributed computing and containerized microservices to enable real-time ingestion, fusion, and ML-driven analysis. Our preliminary results show reduced latency, efficient CPU–memory utilization, and strong scalability. Case studies on NWChem molecular dynamics and E3SM climate simulations demonstrate adaptability across domains, advancing data-aware, exascale-class discovery.
Event Type
Workshop
TimeMonday, 17 November 20254:35pm - 4:40pm CST
Location230
Data Analytics
High Performance I/O, Storage, Archive, & File Systems
Storage
Livestreamed
Recorded
TP
W
Similar Presentations



