about 5 hours ago
Bengaluru, IndiaMid Level / Senior
H1B Sponsor
Responsibilities
- Own incident management practices across all production systems.
- Act as the primary Incident Manager for high priority production incidents.
- Administer and optimize CI/CD pipelines for efficient deployments.
- Continuously improve incident response runbooks and escalation matrices.
- Drive root cause analysis for major incidents and track action items.
- Establish and enforce SLA/SLO/SLI frameworks across production services.
- Build automated runbooks and self-healing mechanisms.
- Implement synthetic monitoring to detect customer-facing issues.
- Utilize Splunk for incident investigation and observability.
Requirements
- 3+ years of experience in SRE, DevOps, or Observability Engineering roles.
- Hands-on experience leading incident response for high-severity incidents.
- Strong background in Linux systems administration and troubleshooting.
- Experience defining and managing SLOs, SLIs, and Error Budgets.
Benefits
- Generous time off policies.
- Top shelf benefits.
- Education, wellness, and lifestyle support.
