Staff Software Engineer I - SRE

20 days ago

Remote, IndiaStaff+

H1B Sponsor

Responsibilities

Analyze systemic failure patterns and design improvements to prevent incidents.
Define and maintain SLO/SLA frameworks and use error budgets for reliability investments.
Build tooling and automation to reduce incident response toil.
Own Rootly configuration and integrations with incident management tools.
Analyze reliability data and build dashboards to drive action.
Serve as an on-call Incident Commander for production incidents.
Develop and deliver training programs for engineering teams.
Edit and review customer-facing incident documents for clarity and quality.
Partner with engineering leaders to enhance reliability practices.