Close

Presentation

Principles and Practice of High-Performance Deep/Machine Learning Training and Inference
DescriptionRecent advances in machine learning and deep learning (ML/DL) have led to many exciting challenges and opportunities. Modern ML/DL frameworks including PyTorch, TensorFlow, and cuML enable high-performance training, inference, and deployment for various types of ML models and deep neural networks (DNNs). This tutorial provides an overview of recent trends in ML/DL and the role of cutting-edge hardware architectures and interconnects in moving the field forward. We will also present an overview of different DNN architectures, ML/DL frameworks, DL training and inference, and hyperparameter optimization, with special focus on parallelization strategies for large models such as GPT, LLaMA, DeepSeek, and ViT. We highlight new challenges and opportunities for communication runtimes to exploit high-performance CPU/GPU architectures to efficiently support large-scale distributed training. We also highlight some of our co-design efforts to utilize MPI for large-scale DNN training on cutting-edge CPU/GPU/DPU architectures available on modern HPC clusters. Throughout the tutorial, we include several hands-on exercises to enable attendees to gain firsthand experience of running distributed ML/DL training and hyperparameter optimizations on a modern GPU cluster.