Senior Software Engineer - Grafana Databases, SRE | United Kingdom | Remote
Grafana
6 days ago
Remote, United Kingdom
Senior
Responsibilities
- Partner closely with product engineering squads to enhance reliability.
- Own production reliability for high-SLA customer environments.
- Design and implement automation to scale reliability practices.
- Ensure customers meet service level objectives (SLOs).
- Define and evolve per-tenant SLOs and reliability models.
- Proactively reduce SLO burn to prevent repeat incidents.
- Serve as a primary escalation point and on-call for incidents.
- Lead customer-impacting incident response and post-incident reviews.
- Contribute to design documents and code reviews.
- Influence feature design for production scalability and operability.
- Build automation to eliminate toil.
- Improve alert quality and reduce noisy escalations.
Requirements
- 6+ years of engineering experience, with 3+ years in SRE/CRE/production engineering.
- Strong Kubernetes experience in AWS, GCP, or Azure.
- Familiarity with infrastructure-as-code tooling like Helm and Terraform.
- Experience operating multi-tenant systems in production.
- Strong experience designing and implementing SLOs.
- Proficiency in one or more programming languages (e.g., Go, Python, Java).
- Knowledge of Linux internals and networking/cloud storage.
- Excellent problem-solving and troubleshooting skills.
- Experience in blame-free incident response and writing high-quality post-incident reviews.
- Ability to reason about performance, scaling, and failure modes.
- Comfortable working in a self-directed engineering team.
- Ability to partner deeply with product engineering teams.
- Intellectual curiosity, transparency, and kindness are highly valued.
Benefits
- 100% remote work with a global culture.
- Opportunities for career growth and development.
- Transparent communication and open decision-making.
- Access to modern AI coding assistants and tools.
- 30 days of annual leave, including Grafana Shutdown Days.
- In-person onboarding for new employees.
Tech Stack
AWSAzureGoGoogle Cloud PlatformGrafanaHelmJavaKubernetesLinuxPythonTerraform
Categories
Data EngineeringDevOps