GrepJob
Anyscale

Software Engineer, Platform Infrastructure (Foundations)

Anyscale
Apply
4 days ago
Palo Alto, CA, USA or San Francisco, CA, USAMid Level / Senior
H1B Sponsor

Responsibilities

  • Design, build, and scale services that orchestrate Ray clusters across cloud and on-prem environments.
  • Optimize control plane components for large-scale, distributed AI/ML workloads.
  • Build intelligent scheduling and resource management systems for heterogeneous compute clusters.
  • Develop features to enhance the reliability, performance, scalability, and observability of Anyscale-managed Ray workloads.
  • Support and optimize accelerator integration (e.g., GPUs, TPUs).
  • Handle container image management and dependency resolution for distributed workloads.
  • Participate in code reviews, design and architecture discussions.
  • Provide on-call support and troubleshoot infrastructure issues.
  • Collaborate with distributed systems and machine learning experts.

Requirements

  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.
  • 3+ years of experience writing high-quality production code.
  • Hands-on experience in building and maintaining highly available, scalable, and performant distributed systems.
  • Expertise in cloud-native technologies (AWS, Azure, GCP) and Kubernetes-based deployments.
  • Deep understanding of networking, security, and authentication mechanisms in cloud environments.
  • Familiarity with observability stacks (Prometheus, Grafana, etc.).
  • Proficiency in Go and Python.
  • Knowledge of low-level operating system foundations (Linux kernel, file systems, containers).

Tech Stack

AWSAzureGoGoogle Cloud PlatformGrafanaKubernetesLinuxPrometheusPython

Categories

AI & MLBackendData EngineeringDevOps