Staff Site Reliability Engineer
GetYourGuideabout 3 hours ago
Berlin, Germany
Staff+
Responsibilities
- Drive down incident frequency, MTTD, and MTTR.
- Lead post-incident reviews and implement systemic improvements.
- Build tooling and runbooks for faster issue diagnosis and resolution.
- Champion a culture of blameless incident handling and continuous improvement.
- Participate in the infrastructure on-call rotation.
- Advance Datadog-based observability practices.
- Ensure meaningful SLOs and actionable alerts.
- Enable production debugging capabilities for engineers.
- Improve change failure rates through automated testing and validation.
- Reduce deployment costs and risks with better tooling and practices.
- Design and maintain well-documented development paths.
- Work hands-on with product teams to enhance system design and operational hygiene.
- Identify cost optimization opportunities across services.
- Leverage AI tooling to improve incident response and workflows.
Requirements
- Deep understanding of observability tooling, particularly Datadog.
- Proven experience in reducing MTTD, MTTR, and change failure rates.
- Strong coding skills in Java and familiarity with Go.
- Experience with Kubernetes, AWS, and service mesh technologies.
- Solid understanding of distributed systems and container technology.
- Hands-on experience with CI/CD and automated testing strategies.
- Ability to influence teams without direct authority.
- Excellent written and verbal communication skills in English.
- Positive, proactive team player passionate about operational excellence.
Benefits
- Annual personal growth budget and mentorship programs.
- Work from anywhere in the world for 40 days per year.
- Flexible working arrangements for work-life balance.
- Opportunities for team collaboration and social events.
- Monthly transportation and fitness budget.
- Discounts on GetYourGuide activities for you and your family.
- Language reimbursement program.
- Health and wellness benefits.
Tech Stack
AWSDatadogGoIstioJavaKubernetesReactVue.js
Categories
AI & MLBackendDevOps