Staff ML Engineer, Generative Model Performance & Efficiency

about 6 hours ago

Mountain View, CA, USA or New York, NY, USAStaff+

H1B Sponsor

Base Salary

$251k - $310k/yr

Responsibilities

Analyze model architectures and identify bottlenecks in training and inference performance.
Apply and develop techniques such as quantization, pruning, and knowledge distillation.
Optimize model code for specific hardware accelerators like TPUs and GPUs.
Experiment with model partitioning and sharding strategies to improve scalability.
Design and implement low-latency, high-throughput serving solutions for generative models.
Build and maintain tools for performance analysis and debugging of ML models.

Requirements

MS or PhD in Computer Science, Machine Learning, Robotics, or a related field.
5+ years of experience with deep learning architectures and optimization techniques.
Proficiency in JAX, Flax, and potentially TensorFlow/PyTorch.
Expertise in using profiling tools to diagnose performance issues in ML workloads.
Hands-on experience with quantization, pruning, and other model compression methods.
Strong programming skills in Python and potentially C++, with software development best practices.

Benefits

Eligible for Waymo’s discretionary annual bonus program.
Participation in equity incentive plan.
Generous Company benefits program.

Tech Stack

C++Python PyTorchTensorFlow

Categories

AI & MLData Science