about 3 hours ago
Remote, Mexico or Mexico City, MexicoSenior / Mid Level
H1B Sponsor
Responsibilities
- Design systems with resilience and capacity in mind.
- Define and measure SLOs and SLIs that reflect customer experiences.
- Utilize Datadog and CloudWatch for observability.
- Configure alerting and routing for effective incident management.
- Continuously improve the incident lifecycle from detection to follow-up.
- Combine software fundamentals with reliability thinking for system availability.
Requirements
- Bachelor's or master's degree in computer science or equivalent experience.
- 4+ years of experience in an SRE or Software Engineering role.
- Hands-on coding experience in Python and/or Go.
- Proven experience with large-scale distributed systems.
- Deep understanding of SLOs, SLIs, error budgets, and MTTR.
- Strong skills in observability and incident response.
- Ability to communicate across technical and non-technical teams.
- Experience with operational tooling and AI-assisted development.
- Leadership skills to plan and lead reliability initiatives.
Benefits
- Healthcare coverage.
- Internet and cell phone reimbursement.
- Learning and development stipend.
- Potential travel opportunities to the Mountain View headquarters.