GrepJob
Dyna Robotics

Staff Machine Learning Infrastructure Engineer

Dyna Robotics
Apply
about 1 month ago
Foster City, CA, USAStaff+
H1B Sponsor

Base Salary

$220k - $320k/yr

Responsibilities

  • Architect and own the infrastructure for large-scale GPU clusters.
  • Implement sharding, activation checkpointing, and memory optimization for training massive multimodal models.
  • Build a research codebase and job scheduling system that prioritizes fast iteration and automated retries.
  • Design high-throughput pipelines to ingest and transform terabytes of multimodal robot data.
  • Build low-latency inference pipelines for real-time robot control.
  • Conduct deep systems profiling to optimize GPU utilization and performance.

Requirements

  • 7+ years of engineering experience in high-performance computing or ML infrastructure.
  • Deep experience with PyTorch and distributed training frameworks.
  • Hands-on experience managing cloud GPU environments and container orchestration.
  • Fundamental understanding of distributed systems, including memory management and communication.
  • Ownership mindset with a focus on designing and operating systems end-to-end.

Tech Stack

AWSGoogle Cloud PlatformKubernetesPyTorch

Categories

AI & MLData EngineeringDevOps