Staff Site Reliability Engineer

3 months ago

Pune, IndiaStaff+

H1B Sponsor

Responsibilities

Architect frameworks and self-service tooling for service reliability.
Drive the AIOps strategy for automated diagnostics and proactive failure prevention.
Embed SRE practices across engineering through design reviews and operational standards.
Act as Incident Commander during critical events and lead blameless postmortems.
Deliver end-to-end monitoring and profiling to optimize system performance.
Mentor engineers across SRE and product teams through technical guidance.

8+ years of experience in Site Reliability Engineering, DevOps, or a similar role.
At least 3+ years in a Senior+ SRE position.
Strong background in running production SaaS systems at scale.
Proficiency in at least one programming/scripting language like Python or Go.
Hands-on expertise with cloud platforms such as AWS, GCP, or Azure.
Deep understanding of networking fundamentals including TCP/IP and DNS.
Experience with monitoring and alerting tools like Prometheus and Grafana.
Familiarity with advanced observability tools like OTEL.
Proven incident management experience with high-severity incidents.
Strong troubleshooting skills across the full stack.
Excellent communication and collaboration skills.