BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260202T201809Z
LOCATION:274
DTSTART;TZID=America/Chicago:20251116T155000
DTEND;TZID=America/Chicago:20251116T161000
UID:submissions.supercomputing.org_SC25_sess201_ws_memo102@linklings.com
SUMMARY:The MALL is Open: Exploring Shared Caches and Latency in AMD CDNA™
  3 GPUs
DESCRIPTION:Andrew Tee (University of California, Riverside; Advanced Micr
 o Devices, Inc. (AMD)); Nicholas Curtis and Noah Wolfe (Advanced Micro Dev
 ices, Inc. (AMD)); and Daniel Wong (University of California, Riverside)\n
 \nThis paper presents an analysis of memory hierarchy latency across AMD I
 nstinct™ MI300A, MI300X, and MI250X GPUs using a fine-grained pointer-chas
 ing microbenchmark. We characterize the scalar L1 (sL1), L2, AMD Infinity 
 Cache™ referred to as the MALL (Memory Attached Last Level), and HBM (High
  Bandwidth Memory), revealing distinct latency levels and architectural tr
 ade-offs. MI300A and MI300X, based on the CDNA3 architecture, exhibit near
 ly identical latency profiles, while MI250X lacks a MALL, resulting in dif
 ferent performance characteristics. Memory latency remains consistent acro
 ss compute partitioning modes, but NUMA Partitioning per Socket (NPS) sign
 ificantly impacts performance. In NPS4 mode, partitioning improves localit
 y, reducing latency by up to 1.42× in MALL and 1.31× in HBM. We further an
 alyze MALL contention and Translation Lookaside Buffer (TLB) behavior unde
 r varying parallelism levels, identifying conditions where MALL performanc
 e degrades. These findings provide actionable insights for optimizing memo
 ry access patterns and improving performance on AMD’s latest GPU architect
 ures.\n\nRecording: Livestreamed, Recorded\n\nRegistration Category: Techn
 ical Program Reg Pass, Workshop Reg Pass\n\nSession Chairs: Stephen L. Oli
 vier (Sandia National Laboratories), Maya Gokhale (Lawrence Livermore Nati
 onal Laboratory (LLNL)), Ivy Peng (KTH Royal Institute of Technology), Kyl
 e Hale (Oregon State University), and Ronald Minnich (Hewlett Packard Ente
 rprise (HPE))\n\n
END:VEVENT
END:VCALENDAR
