GrepJob
Anyscale

Site Reliability Engineer

Anyscale
Apply
about 3 hours ago
Palo Alto, CA, USA or San Francisco, CA, USASenior / Staff+
H1B Sponsor

Responsibilities

  • Define and drive the multi-year technical roadmap for Ray cluster orchestration.
  • Lead the design and optimization of high-performance control plane components.
  • Establish organization-wide standards for reliability, scalability, and observability.
  • Direct the long-term strategy for accelerator integration and container management.
  • Lead complex design discussions and ensure engineering excellence.
  • Partner with ML experts to translate market needs into infrastructure foundations.

Requirements

  • 5+ years of experience in writing production code and leading distributed systems projects.
  • Proven track record in designing and maintaining scalable cloud-native platforms.
  • Deep expertise in Kubernetes-based deployments and container orchestration.
  • Advanced knowledge of Linux kernel, networking, and operating system foundations.
  • Mastery of Go and Python with the ability to set coding standards.
  • Demonstrated ability to mentor engineers and influence technical direction.