Deep Learning Solutions Architect - Inference Optimization
Nvidia
Remote, Spain
What you will be doing:
- Work directly with key customers to understand their technology and provide the best AI solutions.
- Perform in-depth analysis and optimization to ensure the best performance on GPU architecture systems (in particular Grace/ARM based systems). This includes support in optimization of large scale inference pipelines.
- Partner with Engineering, Product and Sales teams to develop, plan best suitable solutions for customers. Enable development and growth of product features through customer feedback and proof-of-concept evaluations.
What we need to see:
- Excellent verbal, written communication, and technical presentation skills in English.
- MS/PhD or equivalent experience in Computer Science, Data Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering fields.
- 5+ years work or research experience with Python/C++/other software development.
- Work experience and knowledge of modern NLP including good understanding of transformer, state space, diffusion, MOE model architectures. This can include either expertise in training or optimization/compression/operation of DNNs.
- Understanding of key libraries used for NLP/LLM training (such as Megatron-LM, NeMo, DeepSpeed etc.) and/or deployment (e.g. TensorRT-LLM, vLLM, Triton Inference Server).
- Enthusiastic about collaborating with various teams and departments such as Engineering, Product, Sales, and Marketing - this person thrives in dynamic environments and stays focused amid constant change.
- Self-starter with demeanor for growth, passion for continuous learning and sharing findings across the team.
Ways to Stand Out from The Crowd:
- Demonstrated experience in running and debugging large-scale distributed deep learning training or inference processes.
- Experience working with larger transformer-based architectures for NLP, CV, ASR or other.
- Applied NLP technology in production environments.
- Proficient with DevOps tools including Docker, Kubernetes, and Singularity.
- Understanding of HPC systems: data center design, high speed interconnect InfiniBand, Cluster Storage and Scheduling related design and/or management experience.
Don't forget to mention EuroTechJobs when applying.