Close

Session

Event Type
Research and ACM SRC Posters
TimeWednesday, 19 November 20258:00am - 7:00pm CST
Tags
Research & ACM SRC Posters
Registration Categories
TP
Presentations
Accelerating AI Co-Scientists with HPC Infrastructure
A Kokkos-Based Proxy of the Exascale Metagenome Assembler MetaHipMer2: A First Use of Kokkos for Computational Biology
GNNs on Evolving Graphs: A Benchmark of Incremental Updates and Meta-Learning Approaches
DiffPro: Joint Timestep and Layer-Wise Precision Optimization for Efficient Diffusion Inference
Luthier: A Dynamic Binary Instrumentation Framework Targeting AMD GPUs
Distributed 3D Gaussian Splatting for High-Resolution Isosurface Visualization
Scalable Multi-Node Multi-GPU Datalog Engine with Energy-Aware Profiling
Divergence Prediction System for CFD Simulations
AdversaGuard: A Distributed Data Poisoning Benchmark for Parallel AI
GATSched: Multi-Objective Graph Attention Networks for Energy-Efficient HPC Job Scheduling
Practical Viability of Translating Legacy Fortran Code to C++ Using Large Language Models
European Open Web Index: Large Complex Graph Visualization
Fast Linear Solvers via AI-Tuned Markov Chain Monte Carlo-Based Matrix Inversion
MPI-SGX: Enabling Confidential Computing for MPI Parallel Applications with Intel SGX Technology
CIRE: LLVM Analysis for Floating-Point Rounding Error Affected by Precision and Optimizations
Harmony: Converged Supercomputer Scratch and Archival Filesystems
Compute System Simulator: Modeling the Impact of Allocation Policy and Hardware Reliability on HPC Cloud Resource Utilization
CATIOS: Time-Resolved I/O-Aware Job Scheduling for HPC Systems
Novel Graph Alignment Algorithms for Identifying Non-Determinism in Large-Scale Simulations
Applying Lossy Compression Techniques to GNN Training
Productive Scalable Distributed Task Scheduling Using an MPI-based Backend for Dagger
Seamless Scaling of Applications Across Programming Models
Orchid: Towards Heterogeneous Batched Eigenvalue Solvers
Evaluating the Power-Monitoring Capabilities of Aurora
Enabling Real-Time, Extreme-Scale Bayesian Inference: FFT-Based GPU-Accelerated Matrix-Vector Products for Block-Triangular Toeplitz Matrices
Detecting Silent Data Corruption in Sparse Matrices Using Hardware Performance Counters
Configuring Large Language Models for Regional Ocean Model Development
Memory-Efficient CFD Based on MPS: Effective One-Billion-Cell Resolution on a Single Node
Wafer-Scale Simulation of Mutator Allele Dynamics in Large Asexual Populations
The Impact of Maximum Vector Length on Cache Management Techniques in RISC-V Vector Extension
Algorithms and Applications of Dynamic Network Analysis Using CANDY
Multi-GPU Implementation and Roofline Analysis of a Numerical Global Ocean Model
A Quantum Solver for Multidimensional Partial Differential Equations: Practical Case Studies
JACC: Easy CPU/GPU Performance Portability for Scientific Applications in Julia
WONDERS: Integrating WOW, PONDER, and SCALE for Enhanced Scheduling Performance
Evaluating LiDAR Compression for 3D Semantic Segmentation in Diverse Off-Road Environments on GOOSE Dataset
GPU Kernels for Mixture of Experts
Local vs. Global FFT Approaches for High-Performance Ultrasound Simulation on Multi-GPU Systems
Optimizing Task-Driven Offloading in LLVM
Understanding LLM Behavior on HPC Data via Mechanistic Interpretability
Unraveling Distant Galaxies: Analyzing IFU Data with Parsl and Academy
Heterogeneity-Aware Task Allocation for Modern HPC Systems
An Agent-Based Viral Venture: Adaptive Tool Selection for Scalable Genomics
Massively Parallel GPU Rasterizer for Next-Generation Computational Lithography
Enhancing Usability and Performance in Experimental Environments Management
Evaluating the Usage of Python Libraries on a Production Supercomputer
Unified Performance Modeling Stack for Distributed GPU Applications: Complementing Analytical Insights with Machine Learning
Massively Parallel Bayesian Inference Framework for GPU Supercomputers: Application to Estimation of Coseismic Fault Slip
Performance Engineering of Scientific Applications with MVAPICH and TAU Using Emerging Communication Primitives
IncineRate: Multi-Modal FPGA Accelerator for SCNNs
Chameleon Concierge: Retrieval-Augmented Generation (RAG) To Enhance Open Testbed Documentation
When Label Propagation Outperforms BFS in Breadth-First Graph Traversal
Inference-as-a-Service Prototype at NERSC
A Scalability Study of Quantum Algorithms for Dimensionality Reduction of Multidimensional Data
AutoSlim: Intelligent Automata Graph Optimization for Efficient Acceleration
Scalable Execution Framework for R on Manycore Systems
Mitigating I/O Bottlenecks in LiDAR Pipelines by Directly Merging Neural Decompression and Semantic Segmentation
Exploring Fine-Grained Parallelism in Data-Flow Runtime Systems on Many-Core Systems
Optimizing Collectives with Large Payloads on GPU-Based Supercomputers
Accelerating Linear Solve with Mixed Precision Nested Recursive Subdivision on AI Hardware
Real-Time ML-Based Defense Against Malicious Payload in Reconfigurable Embedded Systems
C++ Standard Parallelism for GPU Programming in a Particle-In-Cell Application
Tensor Core Accelerated Fast Multipole Method for GROMACS
Divide, Conquer, and Denoise: Hybrid Parallel Diffusion with Memory-Aware Coarse-to-Fine Inference
Optimizing and Extending Periodogram Computations for Astronomy
Scalable Alternative Route Computation with ACE: A C++17 Library for HPC Traffic Simulations
Explicit Low-Order Finite-Element Wave Simulation Accelerated with Variable-Precision Computing Using INT8 Tensor Cores
Shortcut Mixup Policy: Toward Improving Robustness and Speed in Goal-Conditioned RL
Optimizing the GPU All-Reduce Using Multiple Processes Per GPU
Echoes of Earth: Building an Autonomous Environmental Lab for Acoustic Sensing
Enabling Efficient Runtime Data Analysis to a Crystal Deformation Simulation
Energy-Efficient Multimodal LLM Inference: Stage-Level Characterization and Input-Aware Controls​
Parallel Local Motif Counting on Large-Scale Dynamic Graphs
Numerical Investigation of Radiation Hydrodynamic Instabilities at Scale with FleCSI-HARD
CUR-MoE: Portable Mixture-of-Experts with Interpretable High-Ratio Compression
Shipping HPC Ecosystems Across Platforms: Portable and Composable HPC Clusters as Code
High Performance Batch SVD Using GPUs
Analyzing Dataset Popularity for Optimizing In-Network Storage
A Toolbox for Load Balancing Development and Analysis in WarpX/AMReX Applications
Scaling Singular Values Beyond GPU Memory Limits: Out-of-Core, GPU-Accelerated, and Unified Across Data Precision and Hardware
From Petabytes to Predictions: Harnessing Large-Scale NeuroBlu Mental Health Data and ML To Mitigate Medication Non-Adherence
ParaViz3D: MPI Trace Visualization with 3D Video
Learning To Select Scheduling Algorithms in OpenMP
Distributed Modular Digital Twin Network for High-Performance and Reliable Data Centers
csDF: A Double-Float Arithmetic Library for the Cerebras CS-2
PhySiViT: A Physics Simulation Vision Transformer
DiOMP-Offloading: Portable OpenMP Offloading for Distributed Heterogeneous Systems
WiCAT: Reducing Congestion at Wireless Interfaces in Heterogeneous Architectures
Using Hardware Metrics To Understand Performance of the RAJA Performance Suite Kernels in Different GPU Modes on MI300A
A Formal Characterization of Non-Monotonicity in Tensor Cores
Advancing EEG Signal Analysis with Quantum Machine Learning
Towards Application Agnostic HPC Profiling
Intelligent Surrogates Pay Attention to Data, Improving Multi-Objective HPC Optimization
Range Search on Heterogeneous Systems with Processing-in-Memory Architecture
Sync-Free GPU Parallelization of Sparse Kernels from Sequential Python Code
Forward Error Bounds and Efficient Algorithms for Computing a Tensor Times Matrix Chain in Low Precision on GPUs
CROSS-HPC System Bayesian Optimization with Adaptive Transfer
Time-Stepping Hamiltonian Simulation for Solving Nonlinear PDEs via a Quantum-Classical Hybrid Approach
Process-Based Predictors of Vulnerability Reintroduction
Characterizing Performance and Energy Trade-Offs on the Aurora Supercomputer
Accelerating Scientific Workflows with LLM-Driven Compiler Optimizations for Generated High-Performance Hardware
Julia with Intelligent Runtime for Heterogeneous Computing
Can Lossy Compression Benefit NVMe-Based I/O?
ChatHPC: Building the Foundations for a Productive and Trustworthy AI-Assisted HPC Ecosystem
Building the Foundation for Machine Learning-Based Mars Weather Forecasting
Job Grouping-Based Intelligent Resource Recommendation Framework
Unmasking Performance Variability in GPU Codes on Supercomputers
Template Task-Based Multiresolution Analysis in Hybrid Environments
Facilitating Mixed Python-Fortran HPC Codes: 4D Drift-Kinetic Simulations with Pyccel
TidalMark: A Scalable Benchmark for Coastal Water Level Forecasting
High-Performance Sparse Attention on Tensor Cores: Fused3S and Beyond
Author
Hardware-Aware Quantum Circuit Synthesis
Understanding Communication Bottlenecks in Multi-Node LLM Inference
Understanding GPU Utilization Using LDMS Data on Perlmutter
Mojo: Python-Like MLIR-Based GPU Portable Science Kernels
An Approach for Correlating Compiler Optimizations with Runtime Performance
SRAP: Sender-Side Receiver-Aware Port Selection for High-Speed Multi-Flow TCP
Towards a GPU-Accelerated Web-Based Graph Rendering Framework for Large-Scale Protein Networks
From Legacy to Portable: An Agentic AI Workflow for Fortran Code Translation and Cross-Architecture Optimization
ScODA: An Emerging Pipeline for Evaluating Distributed Database Performance To Support Operational Data Analytics
Mixed Compute Environments with OpenCHAMI
HydraCache: LLM Inference Prefill Parallelization Through Distributed Cache Blending
Classifying Performance Bounds Using Machine Learning
Between the NIC and a Hard Place: Evaluating 400 Gb/s Ethernet for HPC Data Transfers
Bridging the Quantum Coding Gap: Instruction-Tuned LLMs for Qiskit
Leveraging Large Language Models for Property Prediction in Polymorphic Organic Semiconductors
Can Long-Haul RDMA Benefit Federated Learning?
An Efficient GEMM Acceleration Method for LLM Inference with Variable-Length Sequences
VaultX Merge: Breaking Memory Barriers in Proof-of-Space Plot Generation