GrepJob
Zoox

AI Inference Engineer - Model Optimization & Deployment

Zoox
Apply
2 days ago
Foster City, CA, USA +2 more
Mid Level / Senior
H1B Sponsor

Base Salary

$242k - $290k/yr

Responsibilities

  • Optimize large-scale models using advanced quantization and mixed-precision workflows.
  • Architect and implement model conversion and compilation pipelines for edge deployment.
  • Perform parity checking, accuracy recovery, and latency benchmarking between frameworks and compiled binaries.
  • Write and optimize custom CUDA kernels and TensorRT Plugins for AI accelerators.
  • Develop production-level C++ and Python code for real-time inference on vehicle SOCs.

Requirements

  • Deep expertise in model quantization and mixed-precision inference workflows.
  • Proven experience optimizing large-scale models utilizing KV-cache optimization and Efficient Attention mechanisms.
  • Extensive experience with model conversion/compilation pipelines and benchmarking.
  • Proficiency in low-level programming for AI accelerators, including CUDA and TensorRT.
  • Production-level C++ and Python programming skills for real-time inference code.

Tech Stack

C++PythonPyTorch

Categories

AI & MLEmbedded