GrepJob
SingleStore

Site Reliability Engineer

SingleStore
Apply
3 months ago
Delhi, IndiaSenior
H1B Sponsor

Responsibilities

  • Develop automation platform to manage infrastructure rollouts across cloud providers.
  • Optimize telemetry platform to identify customer impacting events and provide relevant data for debugging.
  • Partner with engineering team to optimize performance of services for cloud architecture.
  • Debug Live Site events and conduct follow-up postmortem and RCA analysis.
  • Participate in an SLA-driven on-call rotation, including after-hours and weekend participation.

Requirements

  • 5 years of demonstrated experience working as a Site Reliability Engineer.
  • Infrastructure automation experience with scripting skills in Python or Bash.
  • Experience with the Prometheus monitoring stack; familiarity with Grafana, Mimir, and Loki is a plus.
  • Knowledge of Kubernetes and the container ecosystem.
  • Strong cross-group collaboration and communication skills.
  • Familiarity with at least one of AWS, Azure, or Google Cloud.
  • Experience debugging, diagnosing, and troubleshooting complex production software.
  • B.S. Degree in Computer Science or related field.

Tech Stack

AWSAzureBashGoogle CloudGrafanaKubernetesPrometheusPythonSingleStore

Categories