We're looking for a Site Reliability Engineer (SRE) to join our team in London. As an SRE, you will play a critical role in ensuring the reliability and scalability of our systems. You will work closely with our development teams to design, implement, and operate scalable and efficient systems. Your primary focus will be on ensuring the smooth operation of our services, identifying and resolving issues before they impact our users.
Responsibilities:
- Collaborate with development teams to design, implement, and operate scalable and efficient systems
- Identify and resolve issues before they impact our users
- Develop and maintain monitoring and alerting systems to ensure system health and performance
- Work with cross-functional teams to identify and prioritize improvements to our systems
- Stay up-to-date with industry trends and best practices in SRE and cloud computing
Requirements:
- Strong understanding of cloud computing platforms (AWS, GCP, Azure)
- Experience with containerisation (Docker, Kubernetes)
- Familiarity with monitoring and logging tools (Prometheus, Grafana, ELK)
- Strong problem-solving skills and ability to work independently
- Excellent communication and collaboration skills
Benefits:
- Competitive salary and benefits package
- Opportunity to work with a talented team of engineers
- Flexible working hours and remote work options
- Access to cutting-edge technology and tools
If you're passionate about building scalable and reliable systems, and enjoy working in a fast-paced environment, we'd love to hear from you.
XML job scraping automation by YubHub