about 4 hours ago
Remote, IrelandSenior
Responsibilities
- Operate and evolve 100+ multi-cloud streaming clusters and related database infrastructure.
- Diagnose and eliminate cross-layer failure modes.
- Design safe upgrade and rollout strategies at scale.
- Improve observability, automation, and operational ergonomics.
- Collaborate with database and platform teams for scaling and performance.
- Serve as a primary escalation point and on-call for incidents.
- Manage relationships with system vendors.
Requirements
- 6+ years of engineering experience in SRE, platform engineering, or distributed systems roles.
- Experience operating distributed systems in production.
- Strong Kubernetes experience in AWS, GCP, or Azure.
- Solid understanding of distributed systems design and trade-offs.
- Proficiency in at least one programming language, preferably Go.
- Working knowledge of Linux internals and cloud storage.
- Experience in blameless incident response and post-incident reviews.
- Strong communication skills and ability to work autonomously.
Benefits
- 100% remote work with a global culture.
- 30 days of annual leave, including Grafana Shutdown Days.
- Equity and bonus opportunities.
- Access to modern AI coding tools and resources.
- Defined career growth pathways.