Presentation
Constraint-Driven Auto-Tuning of GEMM-Like Operators for MT-3000 Many-Core Processors
DescriptionOptimizing deep learning (DL) operators, especially GEMM-like operations, on heterogeneous many-core processors such as MT-3000 is difficult due to large search spaces and hardware-specific constraints. Existing methods, including hand-tuned libraries and auto-tuners, are either costly to develop or deliver limited performance. We propose DynaChain, an operator-level optimization framework for MT-3000. DynaChain separates computation and data movement, enabling independent optimization and maximizing data reuse across schedules. To shrink the search space, it employs constraint dependency chains that dynamically prune invalid scheduling choices. For irregular matrix dimensions, DynaChain uses an integer linear programming (ILP) based decomposition to avoid padding and enhance hardware utilization. At the low level, it generates optimized micro-kernels tailored to MT-3000’s VLIW+SIMD architecture, improving register allocation and pipelining for irregular operations. Experiments on representative DL operators show that DynaChain eases kernel development for heterogeneous architectures while achieving performance comparable to expert-tuned libraries.
Event Type
Paper
TimeTuesday, 18 November 202511:37am - 12:00pm CST
Location275
HPC for Machine Learning
