GrepJob
Heartflow

Staff/Lead Site Reliability Engineer (SRE)

Heartflow
Apply
24 days ago
San Francisco, CA, USAStaff+ / Senior
H1B Sponsor

Base Salary

$201k - $251k/yr

Responsibilities

  • Lead the design, implementation, and operation of reliable, scalable cloud infrastructure.
  • Define and begin rollout of SLI/SLO standards across microservices.
  • Develop self-service instrumentation tooling enabling engineering teams to own observability.
  • Establish observability and monitoring using OSS toolchain.
  • Serve as a technical escalation point for critical incidents and perform deep-dive root cause analyses.
  • Enhance monitoring, logging, and tracing systems for comprehensive visibility into system health.
  • Set the technical direction and best practices for the SRE and engineering organization.
  • Mentor mid-level and senior engineers on design patterns, operational rigor, and reliability principles.

Requirements

  • 8+ years of progressive experience in Site Reliability Engineering or a closely related role.
  • Deep expertise with AWS, Kubernetes, Helm, and observability stacks.
  • Fluency in at least one major scripting/programming language for automation and tooling.
  • Hands-on engineering mindset capable of instrumenting services directly.
  • Track record of building or improving incident detection and response systems.
  • Deep technical familiarity with Kubernetes ecosystems and modern IaC tooling.
  • Exceptional communication skills for explaining complex technical issues.

Tech Stack

AWSGoGrafanaHarnessHelmIstioJavaKubernetesPrometheusPythonTerraform

Categories