Advisory AI Infrastructure/MLOps Engineer

Advisory AI Infrastructure/MLOps Engineer

Lenovo

Edinburgh, United Kingdom

Lenovo is seeking a highly skilled AI Infrastructure Engineer/AI Operations Engineer to join our growing team. This critical role will focus on designing, building, and maintaining the infrastructure and tools necessary for efficient AI model development, deployment, and operation. Your expertise will enable our data scientists and engineers to focus on high-priority tasks while ensuring seamless operation of AI models in production. If you are passionate about making Smarter Technology For All, come help us realize our Hybrid AI vision!

Responsibilities:

  • AI Infrastructure Design and Implementation: Design, build, and maintain scalable and efficient AI infrastructure, including compute resources, storage solutions, and networking configurations;
  • AI Model Deployment and Management: Develop and implement processes for deploying, monitoring, and managing AI models in production environments;
  • Automation and Tooling: Create and maintain automation scripts and tools for AI model training, testing, evaluation, and deployment in a continuous integration/continuous delivery (CI/CD) pipeline;
  • Collaboration and Support: Work closely with data scientists, engineers, and other stakeholders to ensure smooth operation of AI systems and provide support as needed;
  • Performance Optimization: Continuously monitor and optimize AI infrastructure and models for performance, scalability, utilization, and reliability;
  • Security and Compliance: Ensure AI infrastructure and models comply with relevant security and regulatory requirements.

Qualifications:

  • Bachelor's or Master's degree in Computer Engineering, Electrical Engineering, Computer Science, or a related field;
  • 8+ years of experience in software engineering, DevOps, or a related field;
  • Strong background in computer systems, distributed systems, and cloud computing;
  • Proficient in Linux system administration, including package management, user/group management, file system navigation, shell scripting (e.g. Bash), and system configuration (e.g., systemd, networking);
  • Proficiency in programming languages such as Python, Java, or C++;
  • Experience with AI-specific infrastructure and tools (e.g., NVIDIA GPUs and CUDA);
  • Experience with managing high-performance computing (HPC) clusters, including job scheduling, resource allocation, and cluster maintenance;
  • Experience with setting up multi-node distributed GPU clusters, leveraging Slurm, Kubernetes or related software stacks;
  • Familiarity configuring job scheduling tools (e.g., Slurm);
  • Experience with AI infrastructure, model deployment, and management;
  • Excellent problem-solving and analytical skills;
  • Strong communication and collaboration skills;
  • Ability to work in a fast-paced, dynamic environment.

Bonus Points:

  • Familiarity with AI and machine learning frameworks (e.g., PyTorch);
  • Familiarity with cloud platforms (e.g., AWS, GCP, Azure);
  • Experience with containerization (e.g., Docker) and orchestration (e.g., Kubernetes);
  • Experience with monitoring and logging tools (e.g., Prometheus, Grafana).

Don't forget to mention EuroTechJobs when applying.

Share this Job

More Job Searches

United Kingdom      C++ Developer      Data Science      DevOps and System Administrator      Hybrid      Java Developer      Linux and Unix      Python Developer      Lenovo     

EuroTechJobs Logo

© EuroJobsites 2026