Deep Learning Solutions Architect - Inference Optimization

Deep Learning Solutions Architect - Inference Optimization

Nvidia

Remote, Spain

What you will be doing:

  • Work directly with key customers to understand their technology and provide the best AI solutions.
  • Perform in-depth analysis and optimization to ensure the best performance on GPU architecture systems (in particular Grace/ARM based systems). This includes support in optimization of large scale inference pipelines.
  • Partner with Engineering, Product and Sales teams to develop, plan best suitable solutions for customers. Enable development and growth of product features through customer feedback and proof-of-concept evaluations.

What we need to see:

  • Excellent verbal, written communication, and technical presentation skills in English.
  • MS/PhD or equivalent experience in Computer Science, Data Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering fields.
  • 5+ years work or research experience with Python/C++/other software development.
  • Work experience and knowledge of modern NLP including good understanding of transformer, state space, diffusion, MOE model architectures. This can include either expertise in training or optimization/compression/operation of DNNs.
  • Understanding of key libraries used for NLP/LLM training (such as Megatron-LM, NeMo, DeepSpeed etc.) and/or deployment (e.g. TensorRT-LLM, vLLM, Triton Inference Server).
  • Enthusiastic about collaborating with various teams and departments such as Engineering, Product, Sales, and Marketing - this person thrives in dynamic environments and stays focused amid constant change.
  • Self-starter with demeanor for growth, passion for continuous learning and sharing findings across the team.

Ways to Stand Out from The Crowd:

  • Demonstrated experience in running and debugging large-scale distributed deep learning training or inference processes.
  • Experience working with larger transformer-based architectures for NLP, CV, ASR or other.
  • Applied NLP technology in production environments.
  • Proficient with DevOps tools including Docker, Kubernetes, and Singularity.
  • Understanding of HPC systems: data center design, high speed interconnect InfiniBand, Cluster Storage and Scheduling related design and/or management experience.

Don't forget to mention EuroTechJobs when applying.

Share this Job

More Job Searches

Spain      C++ Developer      Developer      Python Developer      Remote      Nvidia     

EuroTechJobs Logo

© EuroJobsites 2025