4 months ago
San Francisco, CA, USASenior / Staff+
Base Salary
$130k - $500k/yr
Responsibilities
- Own reliability and production safety for core shared services and customer-facing systems.
- Partner with infrastructure leadership to define SRE priorities and production safety roadmap.
- Repair and improve the structure of production systems for stability and resource efficiency.
- Introduce modern SRE practices across engineering teams.
- Collaborate with engineering and applied AI teams for sustainable growth.
- Represent SRE best practices and assist teams in safe onboarding to production.
Requirements
- Experience in true SRE work across multiple roles or companies.
- Deep familiarity with SRE practices popularized by Google.
- 5+ years of SRE experience; 15+ years of overall experience preferred.
- Proven success in operating systems at scale in distributed environments.
- Strong collaboration skills with cross-functional engineering teams.
- Ability to drive cultural change around reliability while being hands-on.
Tech Stack
AWSKubernetesTerraform
