GrepJob
GetYourGuide

Staff Site Reliability Engineer

GetYourGuide
Apply
about 3 hours ago
Berlin, Germany
Staff+

Responsibilities

  • Drive down incident frequency, MTTD, and MTTR.
  • Lead post-incident reviews and implement systemic improvements.
  • Build tooling and runbooks for faster issue diagnosis and resolution.
  • Champion a culture of blameless incident handling and continuous improvement.
  • Participate in the infrastructure on-call rotation.
  • Advance Datadog-based observability practices.
  • Ensure meaningful SLOs and actionable alerts.
  • Enable production debugging capabilities for engineers.
  • Improve change failure rates through automated testing and validation.
  • Reduce deployment costs and risks with better tooling and practices.
  • Design and maintain well-documented development paths.
  • Work hands-on with product teams to enhance system design and operational hygiene.
  • Identify cost optimization opportunities across services.
  • Leverage AI tooling to improve incident response and workflows.

Requirements

  • Deep understanding of observability tooling, particularly Datadog.
  • Proven experience in reducing MTTD, MTTR, and change failure rates.
  • Strong coding skills in Java and familiarity with Go.
  • Experience with Kubernetes, AWS, and service mesh technologies.
  • Solid understanding of distributed systems and container technology.
  • Hands-on experience with CI/CD and automated testing strategies.
  • Ability to influence teams without direct authority.
  • Excellent written and verbal communication skills in English.
  • Positive, proactive team player passionate about operational excellence.

Benefits

  • Annual personal growth budget and mentorship programs.
  • Work from anywhere in the world for 40 days per year.
  • Flexible working arrangements for work-life balance.
  • Opportunities for team collaboration and social events.
  • Monthly transportation and fitness budget.
  • Discounts on GetYourGuide activities for you and your family.
  • Language reimbursement program.
  • Health and wellness benefits.

Tech Stack

AWSDatadogGoIstioJavaKubernetesReactVue.js

Categories

AI & MLBackendDevOps