Close

Session

Event Type
Research and ACM SRC Posters
TimeTuesday, 18 November 20258:00am - 5:00pm CST
Tags
Research & ACM SRC Posters
Registration Categories
TP
Presentations
Intelligent Surrogates Pay Attention to Data, Improving Multi-Objective HPC Optimization
Scaling Singular Values Beyond GPU Memory Limits: Out-of-Core, GPU-Accelerated, and Unified Across Data Precision and Hardware
Evaluating the Usage of Python Libraries on a Production Supercomputer
Optimizing the GPU All-Reduce Using Multiple Processes Per GPU
DiOMP-Offloading: Portable OpenMP Offloading for Distributed Heterogeneous Systems
Scalable Multi-Node Multi-GPU Datalog Engine with Energy-Aware Profiling
Accelerating Linear Solve with Mixed Precision Nested Recursive Subdivision on AI Hardware
Understanding GPU Utilization Using LDMS Data on Perlmutter
GATSched: Multi-Objective Graph Attention Networks for Energy-Efficient HPC Job Scheduling
Job Grouping-Based Intelligent Resource Recommendation Framework
AdversaGuard: A Distributed Data Poisoning Benchmark for Parallel AI
Understanding LLM Behavior on HPC Data via Mechanistic Interpretability
European Open Web Index: Large Complex Graph Visualization
Scalable Execution Framework for R on Manycore Systems
Chameleon Concierge: Retrieval-Augmented Generation (RAG) To Enhance Open Testbed Documentation
Performance Engineering of Scientific Applications with MVAPICH and TAU Using Emerging Communication Primitives
DiffPro: Joint Timestep and Layer-Wise Precision Optimization for Efficient Diffusion Inference
Leveraging Large Language Models for Property Prediction in Polymorphic Organic Semiconductors
SRAP: Sender-Side Receiver-Aware Port Selection for High-Speed Multi-Flow TCP
Orchid: Towards Heterogeneous Batched Eigenvalue Solvers
Mitigating I/O Bottlenecks in LiDAR Pipelines by Directly Merging Neural Decompression and Semantic Segmentation
The Impact of Maximum Vector Length on Cache Management Techniques in RISC-V Vector Extension
Parallel Local Motif Counting on Large-Scale Dynamic Graphs
CROSS-HPC System Bayesian Optimization with Adaptive Transfer
Distributed Modular Digital Twin Network for High-Performance and Reliable Data Centers
Mojo: Python-Like MLIR-Based GPU Portable Science Kernels
csDF: A Double-Float Arithmetic Library for the Cerebras CS-2
ParaViz3D: MPI Trace Visualization with 3D Video
An Efficient GEMM Acceleration Method for LLM Inference with Variable-Length Sequences
An Agent-Based Viral Venture: Adaptive Tool Selection for Scalable Genomics
Real-Time ML-Based Defense Against Malicious Payload in Reconfigurable Embedded Systems
From Petabytes to Predictions: Harnessing Large-Scale NeuroBlu Mental Health Data and ML To Mitigate Medication Non-Adherence
Process-Based Predictors of Vulnerability Reintroduction
A Quantum Solver for Multidimensional Partial Differential Equations: Practical Case Studies
Inference-as-a-Service Prototype at NERSC
WONDERS: Integrating WOW, PONDER, and SCALE for Enhanced Scheduling Performance
Detecting Silent Data Corruption in Sparse Matrices Using Hardware Performance Counters
Optimizing Task-Driven Offloading in LLVM
Mixed Compute Environments with OpenCHAMI
Massively Parallel GPU Rasterizer for Next-Generation Computational Lithography
Julia with Intelligent Runtime for Heterogeneous Computing
From Legacy to Portable: An Agentic AI Workflow for Fortran Code Translation and Cross-Architecture Optimization
Building the Foundation for Machine Learning-Based Mars Weather Forecasting
Compute System Simulator: Modeling the Impact of Allocation Policy and Hardware Reliability on HPC Cloud Resource Utilization
Productive Scalable Distributed Task Scheduling Using an MPI-based Backend for Dagger
Tensor Core Accelerated Fast Multipole Method for GROMACS
Local vs. Global FFT Approaches for High-Performance Ultrasound Simulation on Multi-GPU Systems
Harmony: Converged Supercomputer Scratch and Archival Filesystems
Divide, Conquer, and Denoise: Hybrid Parallel Diffusion with Memory-Aware Coarse-to-Fine Inference
PhySiViT: A Physics Simulation Vision Transformer
Optimizing and Extending Periodogram Computations for Astronomy
IncineRate: Multi-Modal FPGA Accelerator for SCNNs
HydraCache: LLM Inference Prefill Parallelization Through Distributed Cache Blending
Optimizing Collectives with Large Payloads on GPU-Based Supercomputers
Energy-Efficient Multimodal LLM Inference: Stage-Level Characterization and Input-Aware Controls​
Wafer-Scale Simulation of Mutator Allele Dynamics in Large Asexual Populations
Hardware-Aware Quantum Circuit Synthesis
Between the NIC and a Hard Place: Evaluating 400 Gb/s Ethernet for HPC Data Transfers
A Kokkos-Based Proxy of the Exascale Metagenome Assembler MetaHipMer2: A First Use of Kokkos for Computational Biology
Learning To Select Scheduling Algorithms in OpenMP
Understanding Communication Bottlenecks in Multi-Node LLM Inference
A Scalability Study of Quantum Algorithms for Dimensionality Reduction of Multidimensional Data
Distributed 3D Gaussian Splatting for High-Resolution Isosurface Visualization
Divergence Prediction System for CFD Simulations
A Formal Characterization of Non-Monotonicity in Tensor Cores
Practical Viability of Translating Legacy Fortran Code to C++ Using Large Language Models
Shortcut Mixup Policy: Toward Improving Robustness and Speed in Goal-Conditioned RL
TidalMark: A Scalable Benchmark for Coastal Water Level Forecasting
A Toolbox for Load Balancing Development and Analysis in WarpX/AMReX Applications
Exploring Fine-Grained Parallelism in Data-Flow Runtime Systems on Many-Core Systems
Bridging the Quantum Coding Gap: Instruction-Tuned LLMs for Qiskit
C++ Standard Parallelism for GPU Programming in a Particle-In-Cell Application
ChatHPC: Building the Foundations for a Productive and Trustworthy AI-Assisted HPC Ecosystem
High-Performance Sparse Attention on Tensor Cores: Fused3S and Beyond
Author
Applying Lossy Compression Techniques to GNN Training
Enabling Real-Time, Extreme-Scale Bayesian Inference: FFT-Based GPU-Accelerated Matrix-Vector Products for Block-Triangular Toeplitz Matrices
Classifying Performance Bounds Using Machine Learning
Scalable Alternative Route Computation with ACE: A C++17 Library for HPC Traffic Simulations
When Label Propagation Outperforms BFS in Breadth-First Graph Traversal
Sync-Free GPU Parallelization of Sparse Kernels from Sequential Python Code
GNNs on Evolving Graphs: A Benchmark of Incremental Updates and Meta-Learning Approaches
Facilitating Mixed Python-Fortran HPC Codes: 4D Drift-Kinetic Simulations with Pyccel
High Performance Batch SVD Using GPUs
GPU Kernels for Mixture of Experts
Echoes of Earth: Building an Autonomous Environmental Lab for Acoustic Sensing
Time-Stepping Hamiltonian Simulation for Solving Nonlinear PDEs via a Quantum-Classical Hybrid Approach
Towards a GPU-Accelerated Web-Based Graph Rendering Framework for Large-Scale Protein Networks
Massively Parallel Bayesian Inference Framework for GPU Supercomputers: Application to Estimation of Coseismic Fault Slip
Analyzing Dataset Popularity for Optimizing In-Network Storage
Explicit Low-Order Finite-Element Wave Simulation Accelerated with Variable-Precision Computing Using INT8 Tensor Cores
Can Lossy Compression Benefit NVMe-Based I/O?
Memory-Efficient CFD Based on MPS: Effective One-Billion-Cell Resolution on a Single Node
MPI-SGX: Enabling Confidential Computing for MPI Parallel Applications with Intel SGX Technology
CUR-MoE: Portable Mixture-of-Experts with Interpretable High-Ratio Compression
Shipping HPC Ecosystems Across Platforms: Portable and Composable HPC Clusters as Code
Unmasking Performance Variability in GPU Codes on Supercomputers
Novel Graph Alignment Algorithms for Identifying Non-Determinism in Large-Scale Simulations
Can Long-Haul RDMA Benefit Federated Learning?
Using Hardware Metrics To Understand Performance of the RAJA Performance Suite Kernels in Different GPU Modes on MI300A
VaultX Merge: Breaking Memory Barriers in Proof-of-Space Plot Generation
Algorithms and Applications of Dynamic Network Analysis Using CANDY
Accelerating Scientific Workflows with LLM-Driven Compiler Optimizations for Generated High-Performance Hardware
JACC: Easy CPU/GPU Performance Portability for Scientific Applications in Julia
CATIOS: Time-Resolved I/O-Aware Job Scheduling for HPC Systems
Numerical Investigation of Radiation Hydrodynamic Instabilities at Scale with FleCSI-HARD
AutoSlim: Intelligent Automata Graph Optimization for Efficient Acceleration
Evaluating the Power-Monitoring Capabilities of Aurora
WiCAT: Reducing Congestion at Wireless Interfaces in Heterogeneous Architectures
Fast Linear Solvers via AI-Tuned Markov Chain Monte Carlo-Based Matrix Inversion
Enabling Efficient Runtime Data Analysis to a Crystal Deformation Simulation
CIRE: LLVM Analysis for Floating-Point Rounding Error Affected by Precision and Optimizations
Seamless Scaling of Applications Across Programming Models
Range Search on Heterogeneous Systems with Processing-in-Memory Architecture
Accelerating AI Co-Scientists with HPC Infrastructure
An Approach for Correlating Compiler Optimizations with Runtime Performance
Characterizing Performance and Energy Trade-Offs on the Aurora Supercomputer
Configuring Large Language Models for Regional Ocean Model Development
Multi-GPU Implementation and Roofline Analysis of a Numerical Global Ocean Model
Luthier: A Dynamic Binary Instrumentation Framework Targeting AMD GPUs
Unraveling Distant Galaxies: Analyzing IFU Data with Parsl and Academy
Enhancing Usability and Performance in Experimental Environments Management
Forward Error Bounds and Efficient Algorithms for Computing a Tensor Times Matrix Chain in Low Precision on GPUs
Heterogeneity-Aware Task Allocation for Modern HPC Systems
Template Task-Based Multiresolution Analysis in Hybrid Environments
Evaluating LiDAR Compression for 3D Semantic Segmentation in Diverse Off-Road Environments on GOOSE Dataset
ScODA: An Emerging Pipeline for Evaluating Distributed Database Performance To Support Operational Data Analytics
Towards Application Agnostic HPC Profiling
Advancing EEG Signal Analysis with Quantum Machine Learning
Unified Performance Modeling Stack for Distributed GPU Applications: Complementing Analytical Insights with Machine Learning