
Software Engineer: ML Infra
Generalist3 months ago
Somerville, MA, USA or San Mateo, CA, USAMid Level / Senior
H1B Sponsor
Base Salary
$200k - $350k/yr
Responsibilities
- Own and manage GPU compute fleets.
- Ensure GPUs are user-friendly and maximally utilized for researchers.
- Optimize ML data loading transport and storage in distributed environments.
- Orchestrate robot inference fleets.
Requirements
- Experience managing large fleets of GPUs for distributed training or inference.
- Deep knowledge of Slurm or Kubernetes for ML workload orchestration.
- Experience building high-scale ML data loaders and preparation systems.
- Strong understanding of ML hardware, storage, and networking stacks.
- Familiarity with the NVidia GPU ecosystem.
Tech Stack
Kubernetes
Categories
AI & MLData Engineering