Grafana

Senior Software Engineer - Grafana Databases, SRE | United Kingdom | Remote

Grafana

Apply
6 days ago
Remote, United Kingdom
Senior

Responsibilities

  • Partner closely with product engineering squads to enhance reliability.
  • Own production reliability for high-SLA customer environments.
  • Design and implement automation to scale reliability practices.
  • Ensure customers meet service level objectives (SLOs).
  • Define and evolve per-tenant SLOs and reliability models.
  • Proactively reduce SLO burn to prevent repeat incidents.
  • Serve as a primary escalation point and on-call for incidents.
  • Lead customer-impacting incident response and post-incident reviews.
  • Contribute to design documents and code reviews.
  • Influence feature design for production scalability and operability.
  • Build automation to eliminate toil.
  • Improve alert quality and reduce noisy escalations.

Requirements

  • 6+ years of engineering experience, with 3+ years in SRE/CRE/production engineering.
  • Strong Kubernetes experience in AWS, GCP, or Azure.
  • Familiarity with infrastructure-as-code tooling like Helm and Terraform.
  • Experience operating multi-tenant systems in production.
  • Strong experience designing and implementing SLOs.
  • Proficiency in one or more programming languages (e.g., Go, Python, Java).
  • Knowledge of Linux internals and networking/cloud storage.
  • Excellent problem-solving and troubleshooting skills.
  • Experience in blame-free incident response and writing high-quality post-incident reviews.
  • Ability to reason about performance, scaling, and failure modes.
  • Comfortable working in a self-directed engineering team.
  • Ability to partner deeply with product engineering teams.
  • Intellectual curiosity, transparency, and kindness are highly valued.

Benefits

  • 100% remote work with a global culture.
  • Opportunities for career growth and development.
  • Transparent communication and open decision-making.
  • Access to modern AI coding assistants and tools.
  • 30 days of annual leave, including Grafana Shutdown Days.
  • In-person onboarding for new employees.

Tech Stack

AWSAzureGoGoogle Cloud PlatformGrafanaHelmJavaKubernetesLinuxPythonTerraform

Categories

Data EngineeringDevOps