Close

Presentation

Accelerating Intra-Node GPU Communication: A Performance Model for Multi-Path Transfers
DescriptionOptimizing GPU-to-GPU communication is a key challenge for improving performance in MPI-based HPC applications, especially when utilizing multiple communication paths. This paper presents a novel performance model for intra-node multi-path GPU communication within the MPI+UCX framework, aimed at determining the optimal configuration for distributing a single P2P communication across multiple paths. By considering factors such as link bandwidth, pipeline overhead, and stream synchronization, the model identifies an efficient path distribution strategy, reducing communication overhead and maximizing throughput. Through extensive experiments on various topologies, we demonstrate that our model accurately finds theoretically optimal configurations, achieving significant improvements in performance, with the average of less than 6\% error in predicting the optimal configuration for very large messages.