Observability DevOps Site Reliability Engineer (SRE)
Cisco Systems
Oeiras, Portugal
Who You Are
We are seeking a highly skilled and experienced DevOps, Site Reliability Engineer to join our team, focusing on the development & support of Observability capabilities for workloads across CiscoIT Datacenter and Cloud envs. This role involves reshaping how we manage alerts, metrics, and logs by introducing deep learning and GenAI to enhance reliability services. The ideal candidate will have a strong background in relevant Observability technologies & AI/ML with a proven track record of delivering innovative solutions that enhance system monitoring, performance, and reliability. You will take ownership & responsibility for reliability, scalability, automation, and other issues related to uptime and availability of our monitoring solutions.
Minimum Qualifications
- Bachelor’s degree in computer science, Computer Engineering, a related field, or 5+ years of relevant experience.
- Understand lifecycle IT processes including architecture, design, implementation, and operations
- Understanding of security including OS hardening, firewalls, iptables, and working with Infosec
- Understanding of network basics like routers and switches
- Experience with software development tools like GitHub and Jenkins
- Python, Shell, Go, or similar programming experience.
- Software development lifecycle including design, development, testing, packaging, deployment, upgrade, and support.
- Opensource development experience.
- Familiar with Agile software development.
- Leadership in building and maintaining SRE technologies.
- Experience with public cloud like AWS, GCP, or Azure.
- QA and testing experience of your code and the entire platform.
Preferred Qualifications
- Experience with tool suites like Splunk Cloud, Splunk Observability Cloud, Elastic, Prometheus/Thanos & Grafana.
- ThousandEyes, Zabbix & AppD or similar experience a plus.
- Experience with JavaScript either Node JS or React.
- Experience with implementing AI/ML & LLM based Agentic Observability use-cases.
- Experience with Infrastructure or Application Performance Monitoring Solutions & Testing experience in a diverse and complex infrastructure.
- Experience with on-premises cloud technologies using VMware or Openstack.
- Experience with container technologies like Openshift, Kubernetes, and Docker.
- Experience with building and maintaining Redhat or Centos Linux.
- Experience with configuration automation using Ansible.
Behavioral Competencies
- Working with geographically distributed teams
- Self-motivated and willing to help where help is needed
- Able to build relationships, be culturally sensitive, have goal alignment, have learning agility
Don't forget to mention EuroTechJobs when applying.