GrepJob
Generalist

Software Engineer: ML Infra

Generalist
Apply
3 months ago
Somerville, MA, USA or San Mateo, CA, USAMid Level / Senior
H1B Sponsor

Base Salary

$200k - $350k/yr

Responsibilities

  • Own and manage GPU compute fleets.
  • Ensure GPUs are user-friendly and maximally utilized for researchers.
  • Optimize ML data loading transport and storage in distributed environments.
  • Orchestrate robot inference fleets.

Requirements

  • Experience managing large fleets of GPUs for distributed training or inference.
  • Deep knowledge of Slurm or Kubernetes for ML workload orchestration.
  • Experience building high-scale ML data loaders and preparation systems.
  • Strong understanding of ML hardware, storage, and networking stacks.
  • Familiarity with the NVidia GPU ecosystem.

Tech Stack

Kubernetes

Categories

AI & MLData Engineering