GrepJob
Zocdoc

Staff Site Reliability Engineer

Zocdoc
Apply
2 days ago
San Jose, CA, USA
Staff+
H1B Sponsor

Base Salary

$210k - $270k/yr

Responsibilities

  • Participate in and influence high-impact incident response efforts.
  • Define and evolve organization-wide incident practices and reliability tooling.
  • Architect and evolve observability platforms for actionable insights.
  • Lead the development of reliability and observability practices.
  • Guide teams in building resilient, fault-tolerant services.
  • Partner with cross-functional teams to ensure new systems are operable.
  • Design and implement internal tools for deployment safety and incident coordination.
  • Mentor engineers in operational rigor and reliability principles.

Requirements

  • 8+ years of experience in operating and scaling production infrastructure.
  • Deep expertise in incident response and debugging distributed systems.
  • Strong knowledge of observability stacks and alerting strategies.
  • Experience with fault isolation and chaos engineering practices.
  • Proficiency in infrastructure-as-code and configuration management.
  • Ability to influence teams through standards and culture.
  • Strong communication skills for mentoring and aligning across teams.

Benefits

  • Flexible, hybrid work environment.
  • Unlimited Vacation.
  • 100% paid employee health benefit options.
  • Commuter Benefits.
  • 401(k) with employer funded match.
  • Corporate wellness program.
  • Sabbatical leave for employees with 5+ years of service.
  • Competitive paid parental leave and fertility reimbursement.
  • Cell phone reimbursement.
  • Catered lunch every day along with beverages and snacks.
  • Employee Resource Groups and ZocClubs.
  • Great Place to Work Certified.

Tech Stack

Terraform

Categories

DevOpsSecurity