Presentation
BLAZE: Exploiting Hybrid Parallelism and Size-Customized Kernels to Accelerate BLASTP on GPUs
DescriptionThe Basic Local Alignment Search Tool (BLAST), often referred to as the Google of biological research, is widely used to query a large database to find homologous sequences. Though there have been attempts to accelerate protein BLAST on GPUs, they remain slower than multi-threaded implementations. In this paper, we introduce BLAZE, a GPU-accelerated drop-in replacement for protein BLAST that produces identical results while achieving speedups over multi-threaded and GPU-accelerated implementations.
BLAZE's three key innovations include: (1) the use of hybrid (fine-grained + coarse-grained) parallelism, (2) the use of size-customized kernels, unlike previous "one-size-fits-all" approaches, and (3) the use of common-case GPU optimizations that are difficult to support in the general case. On an 8-core system with an NVIDIA RTX 3080 GPU on the 266 GB nr database, BLAZE achieves 18.2x speedup over single-threaded BLASTP, 4.8x speedup over previous GPU-accelerated baselines, and 1.9x speedup over a 16-way multithreaded BLASTP, on average.
BLAZE's three key innovations include: (1) the use of hybrid (fine-grained + coarse-grained) parallelism, (2) the use of size-customized kernels, unlike previous "one-size-fits-all" approaches, and (3) the use of common-case GPU optimizations that are difficult to support in the general case. On an 8-core system with an NVIDIA RTX 3080 GPU on the 266 GB nr database, BLAZE achieves 18.2x speedup over single-threaded BLASTP, 4.8x speedup over previous GPU-accelerated baselines, and 1.9x speedup over a 16-way multithreaded BLASTP, on average.
Event Type
Paper
TimeThursday, 20 November 20253:30pm - 3:52pm CST
Location263-264
Applications
