about 6 hours ago
Base Salary
$251k - $310k/yr
Responsibilities
- Analyze model architectures and identify bottlenecks in training and inference performance.
- Apply and develop techniques such as quantization, pruning, and knowledge distillation.
- Optimize model code for specific hardware accelerators like TPUs and GPUs.
- Experiment with model partitioning and sharding strategies to improve scalability.
- Design and implement low-latency, high-throughput serving solutions for generative models.
- Build and maintain tools for performance analysis and debugging of ML models.
Requirements
- MS or PhD in Computer Science, Machine Learning, Robotics, or a related field.
- 5+ years of experience with deep learning architectures and optimization techniques.
- Proficiency in JAX, Flax, and potentially TensorFlow/PyTorch.
- Expertise in using profiling tools to diagnose performance issues in ML workloads.
- Hands-on experience with quantization, pruning, and other model compression methods.
- Strong programming skills in Python and potentially C++, with software development best practices.
Benefits
- Eligible for Waymo’s discretionary annual bonus program.
- Participation in equity incentive plan.
- Generous Company benefits program.
