Lead GPU Kernel Development Engineer
AMD - Advanced Micro Devices
Cambridge, United Kingdom
THE ROLE:
As a core member of the team, you will play a pivotal role in optimizing and developing deep learning frameworks for AMD GPUs. Your expertise will be critical in enhancing GPU kernels, deep learning models, and training/inference performance across multi-GPU and multi-node systems. You will engage with both internal GPU library teams and open-source maintainers to ensure seamless integration of optimizations, utilizing cutting-edge compiler technologies and advanced engineering principles to drive continuous improvement.
THE PERSON:
Seeking an Industry Leading Expert C++ developer with advanced technical and analytical skills in Linux environments. The ideal candidate will excel in providing technical leadership, guiding teams, and driving projects/initiatives independently. You will define goals, scope, and own development efforts while collaborating effectively within a high-performing team.
KEY RESPONSIBILITIES:
- Optimize Deep Learning Frameworks: Enhance and optimize frameworks like TensorFlow and PyTorch for AMD GPUs in open-source repositories.
- Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations.
- Develop & Optimize Models: Design and optimize deep learning models specifically for AMD GPU performance.
- Collaborate with GPU Library Teams: Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs.
- Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream.
- Work in Distributed Computing Environments: Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems.
- Utilize Cutting-Edge Compiler Tech: Leverage advanced compiler technologies to improve deep learning performance.
- Optimize Deep Learning Pipeline: Enhance the full pipeline, including integrating graph compilers.
- Software Engineering Best Practices: Apply sound engineering principles to ensure robust, maintainable solutions.
- Lead, Guide & Mentor: Provide strategic direction and mentorship to junior team members, fostering growth and collaboration through code reviews, knowledge sharing, and technical guidance.
PREFERRED EXPERIENCE:
- GPU Kernel Development & Optimization: Deep expertise in designing and optimizing GPU kernels for deep learning on AMD GPUs using HIP, CUDA, and assembly (ASM). Strong knowledge of AMD architectures (GCN, RDNA) and low-level programming to maximize performance for AI operations, leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance.
- Deep Learning Integration: Proven ability and experience to integrate GPU-accelerated compute into ML frameworks (e.g., PyTorch, TensorFlow), with a focus on throughput, scalability, and efficient execution for training and inference workloads.
- Software Engineering Excellence: Advanced proficiency in Python and C++ with deep experience in performance tuning, debugging, and robust test design, ensuring reliable, maintainable, high-performance codebases.
- High-Performance Computing: Broad and indepth experience with large-scale, heterogeneous compute environments; adept at optimizing AI workloads for performance, efficiency, and resource utilization across clusters.
- Compiler Optimization: Thorough and detailed understanding of compiler internals, LLVM, and ROCm, with the ability to drive system-level optimizations from source to machine code.
ACADEMIC CREDENTIALS:
- Master’s and/ PhD degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field.
- 7+ years of professional experience in technical software development, with a focus on GPU optimization, performance engineering, and framework development.
Apply Now
Don't forget to mention EuroTechJobs when applying.