Close

Session

Event Type
Research and ACM SRC Posters
TimeThursday, 20 November 20258:00am - 5:00pm CST
Tags
Research & ACM SRC Posters
Registration Categories
TP
Presentations
Can Lossy Compression Benefit NVMe-Based I/O?
Enabling Real-Time, Extreme-Scale Bayesian Inference: FFT-Based GPU-Accelerated Matrix-Vector Products for Block-Triangular Toeplitz Matrices
From Petabytes to Predictions: Harnessing Large-Scale NeuroBlu Mental Health Data and ML To Mitigate Medication Non-Adherence
Optimizing and Extending Periodogram Computations for Astronomy
MPI-SGX: Enabling Confidential Computing for MPI Parallel Applications with Intel SGX Technology
A Toolbox for Load Balancing Development and Analysis in WarpX/AMReX Applications
A Formal Characterization of Non-Monotonicity in Tensor Cores
Enabling Efficient Runtime Data Analysis to a Crystal Deformation Simulation
C++ Standard Parallelism for GPU Programming in a Particle-In-Cell Application
Real-Time ML-Based Defense Against Malicious Payload in Reconfigurable Embedded Systems
European Open Web Index: Large Complex Graph Visualization
High Performance Batch SVD Using GPUs
Facilitating Mixed Python-Fortran HPC Codes: 4D Drift-Kinetic Simulations with Pyccel
Performance Engineering of Scientific Applications with MVAPICH and TAU Using Emerging Communication Primitives
CATIOS: Time-Resolved I/O-Aware Job Scheduling for HPC Systems
A Kokkos-Based Proxy of the Exascale Metagenome Assembler MetaHipMer2: A First Use of Kokkos for Computational Biology
An Approach for Correlating Compiler Optimizations with Runtime Performance
Forward Error Bounds and Efficient Algorithms for Computing a Tensor Times Matrix Chain in Low Precision on GPUs
Time-Stepping Hamiltonian Simulation for Solving Nonlinear PDEs via a Quantum-Classical Hybrid Approach
Scaling Singular Values Beyond GPU Memory Limits: Out-of-Core, GPU-Accelerated, and Unified Across Data Precision and Hardware
Evaluating the Usage of Python Libraries on a Production Supercomputer
Unraveling Distant Galaxies: Analyzing IFU Data with Parsl and Academy
Mixed Compute Environments with OpenCHAMI
Mojo: Python-Like MLIR-Based GPU Portable Science Kernels
AutoSlim: Intelligent Automata Graph Optimization for Efficient Acceleration
Sync-Free GPU Parallelization of Sparse Kernels from Sequential Python Code
CROSS-HPC System Bayesian Optimization with Adaptive Transfer
csDF: A Double-Float Arithmetic Library for the Cerebras CS-2
Scalable Alternative Route Computation with ACE: A C++17 Library for HPC Traffic Simulations
Productive Scalable Distributed Task Scheduling Using an MPI-based Backend for Dagger
Compute System Simulator: Modeling the Impact of Allocation Policy and Hardware Reliability on HPC Cloud Resource Utilization
Accelerating Linear Solve with Mixed Precision Nested Recursive Subdivision on AI Hardware
ChatHPC: Building the Foundations for a Productive and Trustworthy AI-Assisted HPC Ecosystem
Configuring Large Language Models for Regional Ocean Model Development
Energy-Efficient Multimodal LLM Inference: Stage-Level Characterization and Input-Aware Controls​
The Impact of Maximum Vector Length on Cache Management Techniques in RISC-V Vector Extension
Job Grouping-Based Intelligent Resource Recommendation Framework
VaultX Merge: Breaking Memory Barriers in Proof-of-Space Plot Generation
Template Task-Based Multiresolution Analysis in Hybrid Environments
JACC: Easy CPU/GPU Performance Portability for Scientific Applications in Julia
Mitigating I/O Bottlenecks in LiDAR Pipelines by Directly Merging Neural Decompression and Semantic Segmentation
Accelerating Scientific Workflows with LLM-Driven Compiler Optimizations for Generated High-Performance Hardware
WONDERS: Integrating WOW, PONDER, and SCALE for Enhanced Scheduling Performance
Multi-GPU Implementation and Roofline Analysis of a Numerical Global Ocean Model
Parallel Local Motif Counting on Large-Scale Dynamic Graphs
Unified Performance Modeling Stack for Distributed GPU Applications: Complementing Analytical Insights with Machine Learning
Process-Based Predictors of Vulnerability Reintroduction
A Quantum Solver for Multidimensional Partial Differential Equations: Practical Case Studies
Divergence Prediction System for CFD Simulations
Shipping HPC Ecosystems Across Platforms: Portable and Composable HPC Clusters as Code
Echoes of Earth: Building an Autonomous Environmental Lab for Acoustic Sensing
Memory-Efficient CFD Based on MPS: Effective One-Billion-Cell Resolution on a Single Node
PhySiViT: A Physics Simulation Vision Transformer
Understanding GPU Utilization Using LDMS Data on Perlmutter
Luthier: A Dynamic Binary Instrumentation Framework Targeting AMD GPUs
Optimizing Collectives with Large Payloads on GPU-Based Supercomputers
Towards Application Agnostic HPC Profiling
Julia with Intelligent Runtime for Heterogeneous Computing
Distributed Modular Digital Twin Network for High-Performance and Reliable Data Centers
Applying Lossy Compression Techniques to GNN Training
WiCAT: Reducing Congestion at Wireless Interfaces in Heterogeneous Architectures
A Scalability Study of Quantum Algorithms for Dimensionality Reduction of Multidimensional Data
Can Long-Haul RDMA Benefit Federated Learning?
IncineRate: Multi-Modal FPGA Accelerator for SCNNs
Hardware-Aware Quantum Circuit Synthesis
Understanding LLM Behavior on HPC Data via Mechanistic Interpretability
Bridging the Quantum Coding Gap: Instruction-Tuned LLMs for Qiskit
GPU Kernels for Mixture of Experts
TidalMark: A Scalable Benchmark for Coastal Water Level Forecasting
Shortcut Mixup Policy: Toward Improving Robustness and Speed in Goal-Conditioned RL
Leveraging Large Language Models for Property Prediction in Polymorphic Organic Semiconductors
Chameleon Concierge: Retrieval-Augmented Generation (RAG) To Enhance Open Testbed Documentation
Unmasking Performance Variability in GPU Codes on Supercomputers
Learning To Select Scheduling Algorithms in OpenMP
CUR-MoE: Portable Mixture-of-Experts with Interpretable High-Ratio Compression
Algorithms and Applications of Dynamic Network Analysis Using CANDY
Divide, Conquer, and Denoise: Hybrid Parallel Diffusion with Memory-Aware Coarse-to-Fine Inference
SRAP: Sender-Side Receiver-Aware Port Selection for High-Speed Multi-Flow TCP
When Label Propagation Outperforms BFS in Breadth-First Graph Traversal
Wafer-Scale Simulation of Mutator Allele Dynamics in Large Asexual Populations
Exploring Fine-Grained Parallelism in Data-Flow Runtime Systems on Many-Core Systems
CIRE: LLVM Analysis for Floating-Point Rounding Error Affected by Precision and Optimizations
Massively Parallel Bayesian Inference Framework for GPU Supercomputers: Application to Estimation of Coseismic Fault Slip
Analyzing Dataset Popularity for Optimizing In-Network Storage
Novel Graph Alignment Algorithms for Identifying Non-Determinism in Large-Scale Simulations
DiffPro: Joint Timestep and Layer-Wise Precision Optimization for Efficient Diffusion Inference
Local vs. Global FFT Approaches for High-Performance Ultrasound Simulation on Multi-GPU Systems
Explicit Low-Order Finite-Element Wave Simulation Accelerated with Variable-Precision Computing Using INT8 Tensor Cores
Between the NIC and a Hard Place: Evaluating 400 Gb/s Ethernet for HPC Data Transfers
ParaViz3D: MPI Trace Visualization with 3D Video
Advancing EEG Signal Analysis with Quantum Machine Learning
Classifying Performance Bounds Using Machine Learning
Using Hardware Metrics To Understand Performance of the RAJA Performance Suite Kernels in Different GPU Modes on MI300A
Range Search on Heterogeneous Systems with Processing-in-Memory Architecture
Characterizing Performance and Energy Trade-Offs on the Aurora Supercomputer
Intelligent Surrogates Pay Attention to Data, Improving Multi-Objective HPC Optimization
Numerical Investigation of Radiation Hydrodynamic Instabilities at Scale with FleCSI-HARD
Scalable Multi-Node Multi-GPU Datalog Engine with Energy-Aware Profiling
Evaluating the Power-Monitoring Capabilities of Aurora
Massively Parallel GPU Rasterizer for Next-Generation Computational Lithography
High-Performance Sparse Attention on Tensor Cores: Fused3S and Beyond
Author
Evaluating LiDAR Compression for 3D Semantic Segmentation in Diverse Off-Road Environments on GOOSE Dataset
Towards a GPU-Accelerated Web-Based Graph Rendering Framework for Large-Scale Protein Networks
GATSched: Multi-Objective Graph Attention Networks for Energy-Efficient HPC Job Scheduling
Inference-as-a-Service Prototype at NERSC
Heterogeneity-Aware Task Allocation for Modern HPC Systems
Accelerating AI Co-Scientists with HPC Infrastructure
Understanding Communication Bottlenecks in Multi-Node LLM Inference
An Efficient GEMM Acceleration Method for LLM Inference with Variable-Length Sequences
Enhancing Usability and Performance in Experimental Environments Management
HydraCache: LLM Inference Prefill Parallelization Through Distributed Cache Blending
Tensor Core Accelerated Fast Multipole Method for GROMACS
Fast Linear Solvers via AI-Tuned Markov Chain Monte Carlo-Based Matrix Inversion
Detecting Silent Data Corruption in Sparse Matrices Using Hardware Performance Counters
From Legacy to Portable: An Agentic AI Workflow for Fortran Code Translation and Cross-Architecture Optimization
GNNs on Evolving Graphs: A Benchmark of Incremental Updates and Meta-Learning Approaches
ScODA: An Emerging Pipeline for Evaluating Distributed Database Performance To Support Operational Data Analytics
An Agent-Based Viral Venture: Adaptive Tool Selection for Scalable Genomics
Building the Foundation for Machine Learning-Based Mars Weather Forecasting
Optimizing the GPU All-Reduce Using Multiple Processes Per GPU
Distributed 3D Gaussian Splatting for High-Resolution Isosurface Visualization
Harmony: Converged Supercomputer Scratch and Archival Filesystems
AdversaGuard: A Distributed Data Poisoning Benchmark for Parallel AI
Optimizing Task-Driven Offloading in LLVM
Orchid: Towards Heterogeneous Batched Eigenvalue Solvers
Practical Viability of Translating Legacy Fortran Code to C++ Using Large Language Models
DiOMP-Offloading: Portable OpenMP Offloading for Distributed Heterogeneous Systems
Scalable Execution Framework for R on Manycore Systems
Seamless Scaling of Applications Across Programming Models