GrepJob
EarnIn

Site Reliability Engineer II

EarnIn
Apply
about 3 hours ago
Remote, Mexico or Mexico City, MexicoMid Level / Senior
H1B Sponsor

Responsibilities

  • Design systems with resilience and capacity in mind.
  • Define and measure SLOs and SLIs reflecting customer experiences.
  • Utilize Datadog and CloudWatch for effective observability.
  • Configure alerting and routing for incident management.
  • Continuously improve the incident lifecycle from detection to follow-up.
  • Combine software fundamentals with reliability practices for system availability.

Requirements

  • Bachelor's or master's degree in computer science or equivalent experience.
  • 3+ years of experience in an SRE or Software Engineering role.
  • Hands-on coding experience in Python and/or Go.
  • Proven experience with large-scale distributed systems.
  • Deep understanding of SLOs, SLIs, error budgets, and MTTR.
  • Strong skills in observability and incident response.
  • Ability to communicate across technical and non-technical teams.
  • Experience with operational tooling and AI-assisted development.
  • Leadership skills for strategic reliability initiatives and mentoring.

Benefits

  • Remote work option from Mexico with potential hybrid opportunities.
  • Healthcare coverage.
  • Reimbursement for internet and cell phone expenses.
  • Learning and development stipend.
  • Potential travel opportunities to the Mountain View headquarters.

Tech Stack

DatadogGoPython

Categories