Presentation
SIGN IN TO VIEW THIS PRESENTATION Sign In
Multi-rail RoCE, Now with more BGP!
DescriptionHeterogeneous compute nodes containing multiple accelerators and Ethernet network injections have become common in recent years. Despite this, additional network injections beyond the first are often only utilized by application middleware such as MPI or NCCL supporting an RDMA API. We explain why traditional Etherchannel can't support this usecase. We further propose an alternative network configuration which allows these hardware resources to be utilized both by RDMA application middleware such as MPI as well as other applications which utilize the OS provided sockets API rather than a kernel bypass API. This allows user applications using less HPC focused (but potentially more portable) APIs as well as parallel filesystems and other tools to also benefit from the additional networking hardware available in this type of compute node.
