Close

Presentation

Scalable Softmax for Efficient Attention: Parallel and Distributed Strategies
DescriptionAs large-scale deep learning models become integral to scientific discovery and engineering applications, it is increasingly important to teach students how to implement them efficiently and at scale. This paper presents a coding assignment that focuses on optimizing the Softmax function, a central component of many deep learning models, including attention mechanisms in transformer models. The assignment is designed for an undergraduate level Distributed Computing course (CPE 469, 10-week quarter system), and tailored to students with little or no prior experience in machine learning.


This assignment is one of seven designed to reinforce the foundational concepts of parallel programming. It was developed as part of an inquiry-based learning approach \cite{ibl1}, \cite{ibl2}, encouraging students to actively investigate, experiment, and discover solutions to real-world challenges. The assignment introduces essential deep learning concepts, then guides students through identifying independent tasks within the Softmax computation so they can implement parallel solutions using OpenMP and CUDA.

By integrating modern AI workloads into an HPC curriculum, this work equips students with both the conceptual understanding and practical experience needed to build scalable solutions in scientific computing.