1 day ago
San Francisco, CA, USA
Mid Level / Senior
Base Salary
$380k - $555k/yr
Responsibilities
- Design and build high-performance inference runtimes for large-scale AI models.
- Own and optimize core execution paths, including model execution and memory management.
- Develop and improve distributed inference across multiple GPUs.
- Implement and optimize inference-critical operators and kernels.
- Partner with research teams to support new model architectures in inference systems.
- Diagnose and resolve performance bottlenecks through profiling and debugging.
- Contribute to the observability and reliability of large-scale AI systems.
Requirements
- Experience building production inference systems.
- Comfortable with GPU-centric performance engineering.
- Experience with multi-GPU or distributed systems.
- Ability to reason end-to-end about inference pipelines.
- Understanding of research ideas and their implementation within system constraints.
- Enjoy solving complex systems problems that emerge at scale.
- Prefer hands-on technical ownership over abstract design work.
Categories
AI & MLBackendDevOps