Presentation
SIGN IN TO VIEW THIS PRESENTATION Sign In
Extending RAJA Parallel Programming Abstractions with Just-In-Time Optimization
DescriptionThe prevalence of heterogeneous computing systems -- comprising both CPUs and GPUs -- has led to the adoption of performance portability programming models, such as RAJA. These models allow developers to write portable code that compiles ahead-of-time (AOT), unmodified for different backends, thus improving productivity and maintainability.
In this work, we explore the integration of just-in-time (JIT) optimization into portable programming models. Our work aims to improve performance with JIT optimization, without sacrificing portability or developer productivity.
We extend Proteus to support indirect kernel launching through RAJA's abstractions. Our evaluation with the RAJAPerf benchmark suite demonstrates promising speedups for both AMD and NVIDIA GPUs, with no slowdowns recorded for either backend. Specifically, we record speedups from $1.2\times$ up to $23\times$ on AMD MI250X and speedups from $1.1\times$ up to $15\times$ on NVIDIA V100, while preserving the performance portability and ease-of-use benefits of RAJA.
In this work, we explore the integration of just-in-time (JIT) optimization into portable programming models. Our work aims to improve performance with JIT optimization, without sacrificing portability or developer productivity.
We extend Proteus to support indirect kernel launching through RAJA's abstractions. Our evaluation with the RAJAPerf benchmark suite demonstrates promising speedups for both AMD and NVIDIA GPUs, with no slowdowns recorded for either backend. Specifically, we record speedups from $1.2\times$ up to $23\times$ on AMD MI250X and speedups from $1.1\times$ up to $15\times$ on NVIDIA V100, while preserving the performance portability and ease-of-use benefits of RAJA.


