Close

Presentation

Hypertron: Efficiently Scaling Large Models by Exploring High-Dimensional Parallelization Space
DescriptionLarge models are evolving towards massive scale, diverse model architectures (dense and sparse), and long-context processing, which makes it very challenging to efficiently scale large models on parallel machines. The current widely-used parallelization strategies are often sub-optimal due to their limited parallelization strategy space. Therefore, we propose Hypertron, a scalable parallel large-model training framework which incorporates an unprecedented high-dimensional (up to 7D) parallelization space, a holistic scheme for efficient dimension fusion, and a comprehensive performance model to guide the high-dimensional exploration. By exploiting the high-dimensional space to discover the optimal strategy not supported by existing frameworks, Hypertron significantly reduces memory and communication cost while improving parallel scalability. Extensive evaluations demonstrate that Hypertron achieves up to 56.7% Model FLOPs Utilization (MFU) on 2,048 new-generation Ascend NPU accelerators (with supernodes) for different large models (such as sparse 141B and dense 310B), with 1.33x speedup over the best configuration of the state-of-the-art frameworks.