about 3 hours ago
Remote, Mexico or Mexico City, MexicoMid Level / Senior
H1B Sponsor
Responsibilities
- Design systems with resilience and capacity in mind.
- Define and measure SLOs and SLIs reflecting customer experiences.
- Utilize Datadog and CloudWatch for effective observability.
- Configure alerting and routing for incident management.
- Continuously improve the incident lifecycle from detection to follow-up.
- Combine software fundamentals with reliability practices for system availability.
Requirements
- Bachelor's or master's degree in computer science or equivalent experience.
- 3+ years of experience in an SRE or Software Engineering role.
- Hands-on coding experience in Python and/or Go.
- Proven experience with large-scale distributed systems.
- Deep understanding of SLOs, SLIs, error budgets, and MTTR.
- Strong skills in observability and incident response.
- Ability to communicate across technical and non-technical teams.
- Experience with operational tooling and AI-assisted development.
- Leadership skills for strategic reliability initiatives and mentoring.
Benefits
- Remote work option from Mexico with potential hybrid opportunities.
- Healthcare coverage.
- Reimbursement for internet and cell phone expenses.
- Learning and development stipend.
- Potential travel opportunities to the Mountain View headquarters.