Close

Presentation

Bridging FPGA and GPU over PCIe: A Low-Latency Communication Path using AVX-512
DescriptionWe introduce a communication mechanism bridging accelerators like GPUs and PCIe-based FPGA devices using Programmed I/O as an alternative to Direct Memory Access data transmissions: less than 2 microseconds one-way latency for small message transfers is achieved when the FPGA operates as Network Interface Card (NIC).
Our prototype employs APEnetX, a custom FPGA-based NIC, and a CPU engine that atomically writes descriptors and payloads directly into the PCIe device Memory Mapped region using AVX-512 instructions. Additionally, a GPU peer-to-peer remapping technique enables the injections of data packets from the GPU memory into the NIC Memory Mapped aperture with no DMA-orchestrated data movements by the CPU. Microbenchmarks show lower latency than traditional RDMA for small packets with a simpler software stack. This method is not limited to APEnetX: it applies to any FPGA-based NIC or accelerator exposing a PCIe-mapped control aperture, provided the device can read and transmit data from memory.