GrepJob
EarnIn

Staff Site Reliability Engineer

EarnIn
Apply
about 5 hours ago
Mountain View, CA, USAStaff+
H1B Sponsor

Base Salary

$252k - $308k/yr

Responsibilities

  • Set a reliability strategy centered around AI, defining SLIs, SLOs, and error budgets.
  • Redesign the incident lifecycle to enhance speed using AI-assisted processes.
  • Improve on-call processes through automation and AI-driven tools.
  • Integrate AI-first operations into product engineering workflows.
  • Architect resilient systems for capacity planning and failure isolation.
  • Mentor engineers on reliability practices and establish accessible documentation.

Requirements

  • 7+ years in SRE, Software Engineering, or Infrastructure Engineering with a focus on reliability.
  • Experience applying AI/LLMs to operational workflows in production.
  • Expertise in SLOs/SLIs, error budgets, and incident command in distributed systems.
  • Proficient in software engineering with languages like Python or Go.
  • Deep observability experience with tools like Datadog and CloudWatch.
  • Solid infrastructure-as-code skills with Terraform and AWS.
  • Familiarity with AI-assisted development tools and their application in workflows.
  • Experience in fintech or regulated environments is a plus.

Tech Stack

Amazon DynamoDBApache KafkaAWSDatadogGoKubernetesPythonTerraform

Categories