Site Reliability Engineer
EMBL-EBI - European Bioinformatics Institute
Hinxton, United Kingdom
Your role
EMBL-EBI is at the forefront of research and services in the life sciences. Our IT & Technical Services department's Operations team plays a crucial role in supporting this mission by maintaining and developing a range of critical services.
We are seeking a dynamic and experienced Senior Site Reliability Engineer to join our small IT Operations team.
This role is essential for ensuring the availability, reliability and efficiency of EMBL-EBI’s service portfolio, which supports our scientific and research communities. This team has a broad remit, so a background in running, maintaining and enhancing a wide range of services in a medium-sized Enterprise environment would be advantageous.
Your role will include
- Identity Management: Actively participate in upgrading and standardising our Authentication and Authorisation Infrastructure for managing thousands of internal and virtual accounts. Experience with Active Directory, Entra-ID, Redhat IDP would be advantageous.
- Email Systems: Initially focus on understanding, upgrading, and maintaining our email systems, including Postfix, Cyrus, Roundcube, and Mailman. Experience of migrating to and running O365 mail would be an advantage.
- Service Management: Jointly support and develop services such as Transfer Services, software-defined object storage, authentication and authorisation infrastructure, and the Request Tracker ticketing system.
- Monitoring Systems: Help maintain and evolve the distributed Check_mk monitoring system and support the development and growth of a wider monitoring strategy.
- Orchestration Infrastructure: Become fluent in the orchestration infrastructure (Gerrit, Foreman, RPM repositories, Puppet) used to deploy, update and maintain over 3,000 servers.
- Core Modules Maintenance: Maintain and improve core Puppet modules like NTP and SSSD, and manage templates for new RedHat OS versions.
- Documentation: Provide thorough documentation and Standard Operating Procedures to support our Service Desk team and enhance user service experiences.
You may also have
- Education: A degree (or equivalent level of qualification) in a relevant technical subject.
- Experience: Proven experience in systems management and operations, infrastructure engineering, or site reliability engineering.
- Technical Skills: Strong proficiency with Linux at scale, email systems (Postfix, Cyrus, Roundcube, Mailman, O365), and orchestration tools (Puppet, Foreman).
- Problem-Solving: Excellent problem-solving skills and a proactive attitude towards continuous improvement.
- Team Player: Ability to work collaboratively in a multicultural, multi-disciplinary team and a willingness to share knowledge and learn from others.
- Initiative: A self-starter who can take responsibility for tasks and can bring the team along.
You have
- At least 5 years of hands-on experience with Linux production systems in on-prem and potentially cloud hosting environments.
- Experience with postfix mail systems.
- Experience with orchestration and automation.
- A strong sense of responsibility and ethics.
- Experience with 389 directory server or OpenLDAP.
- Puppet expertise.
- You are comfortable with tcpdump, strace and log parsing at scale.
- Experience reviewing Python, Bash and Puppet code created by other team members.
- You are used to taking on high-level responsibilities and guiding others.
You might also have
- A desire to contribute to scientific research from an IT perspective.
- Experience within a high-performing team.
- ITSM experience (we use the ServiceNow platform).
Apply Now
Don't forget to mention EuroTechJobs when applying.