GrepJob
Reducto

Machine Learning Infra Engineer

Reducto
Apply
about 16 hours ago
San Francisco, CA, USAMid Level / Senior
H1B Sponsor

Base Salary

$150k - $300k/yr

Responsibilities

  • Build and maintain the training and inference stack for fast iteration and flexibility.
  • Develop benchmarks to identify bottlenecks in the training and inference processes.
  • Explore state-of-the-art advances in training and inference and apply them.
  • Design systems for scaling model training across multi-node, multi-GPU environments.
  • Scale distributed training and inference workloads across large GPU clusters.
  • Build tooling and abstractions to help ML engineers transition from experiment to production.

Requirements

  • Strong Python skills and a background in systems engineering.
  • Experience with Kubernetes and distributed training frameworks.
  • Ability to solve complex problems and build from first principles.
  • Comfortable working in fast-changing, high-growth environments.
  • Effective collaboration across technical and non-technical teams.
  • Willingness to take full ownership from strategy through execution.

Benefits

  • Unlimited PTO for recharging.
  • Free daily lunch with teammates at the office.
  • Reimbursed transportation costs.
  • Generous health insurance covering medical, dental, and vision.
  • Health and wellness budget of up to $150/month.
  • Flexible parental leave schedule.

Tech Stack

KubernetesPython

Categories

AI & MLData Engineering