BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260202T201804Z
LOCATION:266
DTSTART;TZID=America/Chicago:20251117T140000
DTEND;TZID=America/Chicago:20251117T143000
UID:submissions.supercomputing.org_SC25_sess218_ws_waccpd102@linklings.com
SUMMARY:Bridging FPGA and GPU over PCIe: A Low-Latency Communication Path 
 using AVX-512
DESCRIPTION:Michele Martinelli (National Institute for Nuclear Physics (IN
 FN)); Carlotta Chiarini (National Institute for Nuclear Physics, Sapienza 
 University of Rome); Andrea Biagioni (National Institute for Nuclear Physi
 cs); Paolo Cretaro (National Institute for Nuclear Physics (Currently Unaf
 filiated)); and Ottorino Frezza, Francesca Lo Cicero, Alessandro Lonardo, 
 Pierpaolo Perticaroli, Francesco Simula, Luca Pontisso, Cristian Rossi, an
 d Piero Vicini (National Institute for Nuclear Physics)\n\nWe introduce a 
 communication mechanism bridging accelerators like GPUs and PCIe-based FPG
 A devices using Programmed I/O as an alternative to Direct Memory Access d
 ata transmissions: less than 2 microseconds one-way latency for small mess
 age transfers is achieved when the FPGA operates as Network Interface Card
  (NIC).\nOur prototype employs APEnetX, a custom FPGA-based NIC, and a CPU
  engine that atomically writes descriptors and payloads directly into the 
 PCIe device Memory Mapped region using AVX-512 instructions. Additionally,
  a GPU peer-to-peer remapping technique enables the injections of data pac
 kets from the GPU memory into the NIC Memory Mapped aperture with no DMA-o
 rchestrated data movements by the CPU. Microbenchmarks show lower latency 
 than traditional RDMA for small packets with a simpler software stack. Thi
 s method is not limited to APEnetX: it applies to any FPGA-based NIC or ac
 celerator exposing a PCIe-mapped control aperture, provided the device can
  read and transmit data from memory.\n\nRecording: Livestreamed, Recorded\
 n\nRegistration Category: Technical Program Reg Pass, Workshop Reg Pass\n\
 nSession Chairs: Andreas Herten (Forschungszentrum Jülich, Jülich Supercom
 puting Centre (JSC)); Rabab Alomairy (Massachusetts Institute of Technolog
 y (MIT), King Abdullah University of Science and Technology (KAUST)); and 
 Jorge Luis Galvez Vallejo (Australian National University)\n\n
END:VEVENT
END:VCALENDAR
