GrepJob
Gruve

Senior Software Development Engineer - SRE

Gruve
Apply
1 day ago
Pune, IndiaSenior / Mid Level
H1B Sponsor

Responsibilities

  • Architect reliability improvements across Kubernetes, GPU infrastructure, ML Ops, networking, and monitoring.
  • Lead incident management, blameless post-mortems, and error-budget policies.
  • Drive automation, IaC, and reliability tooling at scale.
  • Oversee metrics, logs, tracing, and dashboards; ensure actionable alerting.
  • Integrate GPU operators/exporters and model lifecycle workflows for inference platforms.
  • Mentor junior and mid-level SREs and guide cross-team initiatives.

Requirements

  • 5–8 years of SRE or platform engineering experience.
  • Expert Kubernetes operations and cloud platform experience (AWS/GCP/Azure).
  • Advanced networking and security fundamentals.
  • Strong coding background (Python, Go, or Java).
  • Deep observability knowledge (Prometheus, Grafana, ELK/Fluentd).

Categories