GrepJob
Cognition

Site Reliability Engineer

Cognition
Apply
7 days ago
San Francisco, CA, USAMid Level / Senior
H1B Sponsor

Base Salary

$260k - $300k/yr

Responsibilities

  • Define and own SLOs, SLIs, and error budgets for Devin and Windsurf.
  • Build monitoring, alerting, and observability systems for service health.
  • Lead incident response and conduct blameless postmortems.
  • Own deployment pipelines and internal developer tooling.
  • Manage cloud infrastructure through code to ensure scalability.
  • Model growth and forecast resource needs for infrastructure.
  • Integrate security as a core reliability requirement.
  • Collaborate with teams to build reliability into product development.

Requirements

  • Deep experience running production systems at scale.
  • Strong software engineering fundamentals.
  • Proficiency with cloud infrastructure (AWS, GCP, or Azure).
  • Experience with container orchestration (Kubernetes).
  • Familiarity with infrastructure as code (Terraform or equivalent).
  • Experience building and owning CI/CD pipelines.
  • Strong observability instincts for system instrumentation.
  • Comfort owning incidents end to end.

Benefits

  • Base salary of $260,000 - $300,000 plus significant early-stage equity.
  • Fully paid medical, dental, and vision for you and your dependents.
  • 401(k) with company match.
  • Perks include a private chef, cozy slippers, and endless snacks.

Tech Stack

AWSAzureGoogle Cloud PlatformKubernetesTerraform

Categories