2 days ago
Foster City, CA, USA +2 more
Mid Level / Senior
H1B Sponsor
Base Salary
$242k - $290k/yr
Responsibilities
- Optimize large-scale models using advanced quantization and mixed-precision workflows.
- Architect and implement model conversion and compilation pipelines for edge deployment.
- Perform parity checking, accuracy recovery, and latency benchmarking between frameworks and compiled binaries.
- Write and optimize custom CUDA kernels and TensorRT Plugins for AI accelerators.
- Develop production-level C++ and Python code for real-time inference on vehicle SOCs.
Requirements
- Deep expertise in model quantization and mixed-precision inference workflows.
- Proven experience optimizing large-scale models utilizing KV-cache optimization and Efficient Attention mechanisms.
- Extensive experience with model conversion/compilation pipelines and benchmarking.
- Proficiency in low-level programming for AI accelerators, including CUDA and TensorRT.
- Production-level C++ and Python programming skills for real-time inference code.
Tech Stack
C++PythonPyTorch
Categories
AI & MLEmbedded