Presentation
AI and Scientific Research Computing with Kubernetes
DescriptionKubernetes has emerged as the leading container orchestration solution (maintained by the Cloud Native Computing Foundation) that works on resources ranging from on-prem clusters to commercial clouds. Kubernetes capabilities are available on Expanse, Voyager, and Prototype National Research Platform (PNRP) Nautilus clusters at SDSC. These clusters support AI and scientific computing research workloads. Recently there has also been rapid growth in the use of AI resources for educational purposes. Several institutions have incorporated LLMs into their curriculum, leveraging Nautilus services and resources. This tutorial aims to educate AI and computational science researchers on the capabilities of Kubernetes as a resource management system, compared with traditional batch systems; provide information on useful IO/storage options, and optimal use strategies for AI workloads; and demonstrate the use of Kubernetes-based solutions integrating LLM inference use for classroom use via JupyterHub. Attendees will get an overview of the Kubernetes architecture, typical job and workflow submission procedures, use various storage options, run AI and scientific research software using Kubernetes using both CPU and GPU resources, learn about optimal I/O strategies for AI, and run examples leveraging LLM inference services on Nautilus. Theoretical information will be paired with hands-on sessions operating on the PNRP production cluster Nautilus.
Event Type
Tutorial
TimeMonday, 17 November 20258:30am - 12:00pm CST
Location123
Livestreamed
Recorded


