27 days ago
Santa Clara, CA, USASenior / Mid Level
Base Salary
$175k - $296k/yr
Responsibilities
- Optimize transformer-based LLMs for low-latency and high-throughput inference.
- Optimize kernels and model graphs using tools like CUDA, Triton, and custom fused operators.
- Implement and benchmark techniques such as quantization and knowledge distillation.
- Deploy optimized models across GPUs, CPUs, and edge accelerators.
- Contribute to internal tooling and documentation for model optimization flows.
Requirements
- Master's degree in CS/CE/EE or equivalent with 3+ years of industry experience.
- Good knowledge of PyTorch.
- Knowledge of transformer architecture and ways to accelerate training and inference.
Benefits
- A fun, supportive, and engaging environment.
- Infrastructures and computational resources to support your work.
- Opportunity to work on cutting-edge technologies with top talents in the field.
- Opportunity to make a significant impact on the transportation revolution.
- Competitive compensation package.
- Snacks, lunches, dinners, and fun activities.