25 days ago
Stockholm, SwedenSenior
Responsibilities
- Build, ship, and operate foundational platform services with full ownership.
- Maintain a high-signal observability stack and translate signals into action.
- Define and evolve SLIs/SLOs, alerting, and reliability reporting for critical systems.
- Improve on-call and incident response processes, including escalation paths and post-incident follow-ups.
- Reduce toil through automation and improved system ergonomics.
- Collaborate with product and platform engineers to design resilient systems.
Requirements
- Significant experience operating and improving production systems.
- Comfortable writing software and building automation for reliability issues.
- Autonomous with a focus on quality and resilience of systems.
- Strong understanding of failure modes, graceful degradation, and practical tradeoffs.
- Experience with observability, incident management, and on-call duties.
- Familiarity with cloud infrastructure and Kubernetes.
Tech Stack
Kubernetes
