Presentation
SIGN IN TO VIEW THIS PRESENTATION Sign In
Orchestrating Complex HPC and AI/ML Workflows on Kubernetes Using Flux and AWS
DescriptionScientific computing workflows are growing increasingly complex, combining diverse computational patterns, heterogeneous resources, and sophisticated dependencies that challenge traditional orchestration tools. Meanwhile, cloud and AI architectures are driving Kubernetes adoption for these workloads. Deploying workflow components that provide the performance and features required for HPC simulations and applications remains challenging in this environment. This tutorial demonstrates a portability layer to solve this problem—integration of the Flux Framework with Kubernetes to efficiently manage complex scientific workflows on Amazon Web Services (AWS). Participants will learn how Flux’s hierarchical resource management and graph-based scheduling capabilities extend Kubernetes to support diverse workflows. The tutorial progresses from foundational infrastructure concepts to advanced Flux capabilities, culminating in deploying MuMMI (Multiscale Machine-learned Modeling Infrastructure)—a scientific workflow exemplifying emerging complexity through combined large-scale simulations and machine learning. Through lectures and hands-on labs using Amazon EKS, attendees will experience how this architecture supports demanding workflows while maintaining portability across on-premises, cloud, and hybrid environments. Using practical examples, participants will gain applicable skills for orchestrating complex workflows in various computing environments. In the end, attendees will learn how to build efficient, scalable, and flexible environments for complex scientific workflows using Kubernetes, Flux, and cloud infrastructure.






