GrepJob
Nebius

Senior Site Reliability Engineer (In-Office Required)

Nebius
Apply
about 3 hours ago
New York, NY, USASenior / Mid Level

Base Salary

$156k - $262k/yr

Responsibilities

  • Managing Kubernetes clusters across multiple environments and regions.
  • Owning infrastructure as code for all resources.
  • Maintaining and improving CI/CD pipelines and GitOps-based deployments.
  • Maintaining and optimizing real-time data pipelines that process billions of events per day across distributed queues and stream processors.
  • Building out monitoring, alerting, and observability.
  • Debugging production issues across services.
  • Managing cloud costs and capacity planning.
  • Working closely with a small engineering team — you own infra, not a slice of it.

Requirements

  • 5-8 years in a DevOps or SRE role, working in production environments.
  • Proven experience designing and operating large-scale, distributed systems, with a solid understanding of API design, reliability, and performance at scale.
  • Strong Kubernetes experience in a managed cloud environment.
  • Proficiency with infrastructure as code (Terraform or similar).
  • Experience with GitOps-based deployment workflows.
  • Built or maintained observability stacks (logging, metrics, alerting).
  • Experience handling production incidents calmly and methodically.

Benefits

  • 100% company-paid medical, dental, and vision coverage for employees and families.
  • Up to 4% company match on 401(k) plan with immediate vesting.
  • 20 weeks paid parental leave for primary caregivers, 12 weeks for secondary caregivers.
  • Up to $85/month reimbursement for mobile and internet.
  • Company-paid short-term, long-term, and life insurance coverage.

Categories

AI & MLData EngineeringDevOps