about 7 hours ago
Bengaluru, IndiaMid Level / Senior
H1B Sponsor
Responsibilities
- Design systems with resilience and capacity in mind.
- Define and measure SLOs and SLIs reflecting customer experience.
- Utilize Datadog and CloudWatch for effective observability.
- Configure alerting and routing for incident management.
- Improve the incident lifecycle from detection to follow-up.
- Combine software fundamentals with reliability practices.
- Communicate effectively with technical and non-technical teams.
- Plan and execute reliability initiatives for the team.
Requirements
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
- 3+ years of experience in an SRE or Software Engineering role.
- Hands-on coding experience in at least two programming languages.
- Experience managing production environments effectively.
- Strong belief in the importance of observability for service performance.
- Experience using SLOs, SLIs, and KPIs for decision-making.
- Familiarity with the SRE book and its application in different contexts.
- Proficiency with AI-assisted development tools.
- Experience building AI workflows for operational efficiency.
- Ability to learn from production incidents and implement changes.
- Interest in mentoring peers to improve reliability.
Benefits
- Excellent employee benefits including healthcare.
- Internet/cell phone reimbursement.
- Learning and development stipend.
- Opportunities to collaborate with teams in Palo Alto and Bangkok.