about 7 hours ago
Boston, MA, USA +2 moreSenior / Staff+
H1B Sponsor
Base Salary
$226k - $307k/yr
Responsibilities
- Allocate and distribute system resources to various models and inference engines running on the robot.
- Spearhead initiatives for better compute utilization through model sharing and improved scheduling.
- Optimize large-scale models using advanced quantization and mixed-precision inference frameworks.
- Architect and implement model conversion and compilation pipelines for edge deployment.
- Write low-latency, memory-safe C++ and CUDA code for real-time inference.
Requirements
- Deep experience in system and performance optimization in CPU/GPU systems.
- Expertise in real-time systems with constraints like processing latency and memory utilization.
- Expertise in model quantization and mixed-precision inference frameworks.
- Proficiency in low-level programming for AI accelerators and optimizing custom ML OPs.
- Production-level C++ and Python programming skills for real-time inference code.