Presentation
Towards Application Agnostic HPC Profiling
DescriptionModern HPC systems generate large amounts of GPU and network telemetry, typically used for system health monitoring. At NERSC, we are developing a Performance API/UI that generates a job report card from this telemetry, providing an overview of performance characteristics. Using DCGM counters, we report GPU memory, compute, and power usage, and present preliminary investigations of job-level network activity. Without traditional profiling tools, this application-agnostic approach helps identify resource utilization imbalances, detect anomalies such as memory leaks, and assess overall performance for the user without additional effort.

Event Type
Research and ACM SRC Posters
TimeTuesday, 18 November 20258:00am - 5:00pm CST
LocationSecond Floor Atrium
Archive
view

