Presentation
SIGN IN TO VIEW THIS PRESENTATION Sign In
Performance Engineering for Sparse Linear Solvers
DescriptionThis tutorial covers code analysis, performance modeling, and optimization for sparse linear solvers on CPUs and GPUs. Performance engineering is often taught using simple loops as examples for performance models and how they can guide optimization; however, full, preconditioned linear solvers comprise multiple loops and an iteration scheme that is executed to convergence. Consequently, the concept of "optimal performance" must account for both hardware efficiency and solver convergence. After introducing basic notions of hardware organization and storage for dense and sparse data structures, we show how to apply the roofline model to such solvers in predictive and diagnostic ways and how it can be used to assess the hardware efficiency of a solver, covering important corner cases such as memory boundedness. Then we advance to preconditioned solvers, using the conjugate gradient method (CG) algorithm as a leading example. Bottlenecks of the solver are identified, followed by the introduction of optimization techniques like the use of preconditioners and cache blocking. The interplay among solver performance, convergence, and time to solution is given special attention. In hands-on exercises, attendees will be able to carry out experiments on a GPU cluster and study the influence of matrix data formats, preconditioners, and cache optimizations.



