Staff Site Reliability Engineer

2 days ago

San Jose, CA, USA

Staff+

H1B Sponsor

Base Salary

$210k - $270k/yr

Responsibilities

Participate in and influence high-impact incident response efforts.
Define and evolve organization-wide incident practices and reliability tooling.
Architect and evolve observability platforms for actionable insights.
Lead the development of reliability and observability practices.
Guide teams in building resilient, fault-tolerant services.
Partner with cross-functional teams to ensure new systems are operable.
Design and implement internal tools for deployment safety and incident coordination.
Mentor engineers in operational rigor and reliability principles.

Requirements

8+ years of experience in operating and scaling production infrastructure.
Deep expertise in incident response and debugging distributed systems.
Strong knowledge of observability stacks and alerting strategies.
Experience with fault isolation and chaos engineering practices.
Proficiency in infrastructure-as-code and configuration management.
Ability to influence teams through standards and culture.
Strong communication skills for mentoring and aligning across teams.

Benefits

Flexible, hybrid work environment.
Unlimited Vacation.
100% paid employee health benefit options.
Commuter Benefits.
401(k) with employer funded match.
Corporate wellness program.
Sabbatical leave for employees with 5+ years of service.
Competitive paid parental leave and fertility reimbursement.
Cell phone reimbursement.
Catered lunch every day along with beverages and snacks.
Employee Resource Groups and ZocClubs.
Great Place to Work Certified.

Tech Stack

Terraform

Categories

DevOpsSecurity