Site Reliability Engineer

3 months ago

Delhi, IndiaSenior

H1B Sponsor

Responsibilities

Develop automation platform to manage infrastructure rollouts across cloud providers.
Optimize telemetry platform to identify customer impacting events and provide relevant data for debugging.
Partner with engineering team to optimize performance of services for cloud architecture.
Debug Live Site events and conduct follow-up postmortem and RCA analysis.
Participate in an SLA-driven on-call rotation, including after-hours and weekend participation.

5 years of demonstrated experience working as a Site Reliability Engineer.
Infrastructure automation experience with scripting skills in Python or Bash.
Experience with the Prometheus monitoring stack; familiarity with Grafana, Mimir, and Loki is a plus.
Knowledge of Kubernetes and the container ecosystem.
Strong cross-group collaboration and communication skills.
Familiarity with at least one of AWS, Azure, or Google Cloud.
Experience debugging, diagnosing, and troubleshooting complex production software.
B.S. Degree in Computer Science or related field.

AWS AzureBashGoogle CloudGrafanaKubernetesPrometheusPythonSingleStore