Close

Presentation

Architecting Tensor Core-Based Reductions for Irregular Molecular Docking Kernels
DescriptionTensor Cores (TCs) are specialized hardware units designed for efficient matrix multiplication and are widely utilized in deep learning workloads. However, their adoption in more irregular high-
performance computing (HPC) applications remains limited. This paper presents a methodology for effectively integrating TCs into a representative HPC application: molecular docking with AutoDock-GPU. The irregular computational patterns and strict accuracy requirements of this application pose significant challenges for TC utilization. To address these, we adopt a twofold strategy: (i) accelerating sum reduction operations using TCs, and (ii) applying state-of-the-art numerical error correction (EC) techniques to maintain accuracy. Experimental evaluations on NVIDIA A100, H100, and B200 GPUs show that our CUDA-based implementation consistently outperforms the baseline while preserving algorithmic accuracy.