24 days ago
Base Salary
$240k - $300k/yr
Responsibilities
- Architect and own reliability and infrastructure strategy across multiple teams and services.
- Own the observability, capacity planning, and monitoring strategy for complex distributed systems.
- Define and evolve SLI/SLO frameworks, error budgets, and production readiness standards org-wide.
- Lead incident management practices, escalation design, and drive systemic improvements from post-mortems.
- Champion toil reduction, on-call sustainability, and long-term system resilience.
- Partner across engineering to lead design reviews and raise the bar on deployment safety and operational quality.
Requirements
- Deep experience operating, designing, and improving large-scale production systems.
- A technical leader who drives reliability outcomes across teams.
- Strong architectural judgment for distributed systems, including failure modes and scalability.
- Effective communicator who spreads knowledge through post-mortems and design guidance.
Benefits
- Comprehensive salary, benefits, and tools for success.
- Medical, Dental & Vision plans with multiple options.
- Generous parental leave and family support services.
- 401(K) with generous company match and unlimited PTO.
- Pre-tax commuter benefits and robust voluntary benefits.
