Presentation
GPU Kernel Engineer
·
mako.dev
·
New York, NY
SessionJob Postings
DescriptionOur R&D team is seeking expert level GPU kernel engineers to help build the world’s best LLMs and Agents for GPU kernel generation.
The goal is simple: design an AI agent that writes and optimizes kernels in the same way you do. You will collaborate with the training team to define robust evaluation, validation, and reward models that will be used to train LLMs in the art of GPU kernel engineering. You will also contribute to the AI agent architecture itself, defining the workflows that enable an LLM to discover and implement high performance GPU kernels.
**This job is based in either Gdansk or New York City.** Remote work will be considered for exceptional candidates.
Responsibilities:
- Explore and analyze performance bottlenecks in ML training and inference.
- Develop and optimize high-performance computing kernels in Triton, CUDA, and/or ROCm.
- Implement programming solutions in C/C++ and Python.
- Deep dive into GPU performance optimizations to maximize efficiency and speed.
- Collaborate with the team to extend and improve existing machine learning compilers or frameworks such as MLIR, Pytorch, Tensorflow, ONNX Runtime, TensorRT. (This is optional but beneficial)
The goal is simple: design an AI agent that writes and optimizes kernels in the same way you do. You will collaborate with the training team to define robust evaluation, validation, and reward models that will be used to train LLMs in the art of GPU kernel engineering. You will also contribute to the AI agent architecture itself, defining the workflows that enable an LLM to discover and implement high performance GPU kernels.
**This job is based in either Gdansk or New York City.** Remote work will be considered for exceptional candidates.
Responsibilities:
- Explore and analyze performance bottlenecks in ML training and inference.
- Develop and optimize high-performance computing kernels in Triton, CUDA, and/or ROCm.
- Implement programming solutions in C/C++ and Python.
- Deep dive into GPU performance optimizations to maximize efficiency and speed.
- Collaborate with the team to extend and improve existing machine learning compilers or frameworks such as MLIR, Pytorch, Tensorflow, ONNX Runtime, TensorRT. (This is optional but beneficial)
Requirements- Bachelor's, Master’s or PhD’s degree in Computer Science, Electrical Engineering, or a related field.
- Strong programming skills in C/C++ and Python.
- Deep understanding and experience in GPU performance optimizations.
- Proven experience with kernel optimizations on CUDA, ROCm, or other accelerators.
- General experience with the training and deployment of ML models
- Experience with distributed systems development or distributed ML workloads
Company DescriptionMako is a venture-backed AI lab building building tools to automate algorithm discovery and GPU performance engineering. There are two core components:
1. MakoGenerate writes GPU kernels in CUDA, HIP, and Triton using LLMs
2. MakoOptimize automatically selects and swaps GPU kernels in combination with tuning inference engine (vLLM, SGlang, etc..) hyperparameters to optimize performance
Bonus Points:
- Experience with innovative OSS projects like FlashAttention, FlashInfer, vllm, SGLang.
- Experience with machine learning compilers or frameworks such as TVM, MLIR, Pytorch, Tensorflow, ONNX Runtime, TensorRT.
·
·
2025-10-19

Event Type
Job Posting
TimeMonday, 17 November 20254:03pm - 4:03pm CST
LocationHall 6
United States of America
mako.dev
In-person
Remote
Full Time
Permanent
