GrepJob
EarnIn

Senior Site Reliability Engineer

EarnIn
Apply
about 3 hours ago
Remote, Mexico or Mexico City, MexicoSenior / Mid Level
H1B Sponsor

Responsibilities

  • Design systems with resilience and capacity in mind.
  • Define and measure SLOs and SLIs that reflect customer experiences.
  • Utilize Datadog and CloudWatch for observability.
  • Configure alerting and routing for effective incident management.
  • Continuously improve the incident lifecycle from detection to follow-up.
  • Combine software fundamentals with reliability thinking for system availability.

Requirements

  • Bachelor's or master's degree in computer science or equivalent experience.
  • 4+ years of experience in an SRE or Software Engineering role.
  • Hands-on coding experience in Python and/or Go.
  • Proven experience with large-scale distributed systems.
  • Deep understanding of SLOs, SLIs, error budgets, and MTTR.
  • Strong skills in observability and incident response.
  • Ability to communicate across technical and non-technical teams.
  • Experience with operational tooling and AI-assisted development.
  • Leadership skills to plan and lead reliability initiatives.

Benefits

  • Healthcare coverage.
  • Internet and cell phone reimbursement.
  • Learning and development stipend.
  • Potential travel opportunities to the Mountain View headquarters.

Tech Stack

DatadogGoPython

Categories