GrepJob
EarnIn

Site Reliability Engineer II

EarnIn
Apply
about 7 hours ago
Bengaluru, IndiaMid Level / Senior
H1B Sponsor

Responsibilities

  • Design systems with resilience and capacity in mind.
  • Define and measure SLOs and SLIs reflecting customer experience.
  • Utilize Datadog and CloudWatch for effective observability.
  • Configure alerting and routing for incident management.
  • Improve the incident lifecycle from detection to follow-up.
  • Combine software fundamentals with reliability practices.
  • Communicate effectively with technical and non-technical teams.
  • Plan and execute reliability initiatives for the team.

Requirements

  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
  • 3+ years of experience in an SRE or Software Engineering role.
  • Hands-on coding experience in at least two programming languages.
  • Experience managing production environments effectively.
  • Strong belief in the importance of observability for service performance.
  • Experience using SLOs, SLIs, and KPIs for decision-making.
  • Familiarity with the SRE book and its application in different contexts.
  • Proficiency with AI-assisted development tools.
  • Experience building AI workflows for operational efficiency.
  • Ability to learn from production incidents and implement changes.
  • Interest in mentoring peers to improve reliability.

Benefits

  • Excellent employee benefits including healthcare.
  • Internet/cell phone reimbursement.
  • Learning and development stipend.
  • Opportunities to collaborate with teams in Palo Alto and Bangkok.

Tech Stack

Datadog

Categories