27 days ago
Santa Clara, CA, USAStaff+
Base Salary
$215k - $364k/yr
Responsibilities
- Own the end-to-end quantization and optimization roadmap for large-scale multimodal models.
- Apply and innovate in Post-Training Quantization, Quantization-Aware Training, and pruning techniques.
- Collaborate with model researchers to ensure architectures are deployment-friendly.
- Develop and maintain robust, safety-critical deployment stacks in Modern C++.
Requirements
- 5-8 years of experience in model deployment, quantization, or high-performance computing.
- Mastery of Modern C++ and deep experience with CUDA or other hardware acceleration libraries.
- Strong familiarity with PyTorch and knowledge of inference engines like TensorRT, ONNX Runtime, or TVM.
- Hands-on experience with INT8/FP8/INT4 quantization and knowledge of challenges in quantizing Large Language Models.
- Solid understanding of computer architecture and experience with embedded/edge compute constraints.
- Ability to debug complex performance bottlenecks across the entire software stack.
Benefits
- A fun, supportive and engaging environment.
- Infrastructures and computational resources to support your ML model development/research.
- Opportunity to work on cutting edge technologies with the top talent in the field.
- Opportunity to make significant impact on transportation revolution by advancing autonomous driving.
- Competitive compensation package.
- Snacks, lunches, dinners, and fun activities.