about 1 month ago
Berlin, GermanySenior / Staff+
Responsibilities
- Optimize real-time inference systems and multimodal models.
- Take ownership of machine learning models from research to production.
- Design benchmarks or prototypes to clarify unclear problems.
- Collaborate with US-based leadership and engineering teams.
- Ensure performance, latency, and reliability are prioritized in product features.
Requirements
- Deep understanding of inference optimization techniques like vLLM or TRT-LLM.
- Hands-on experience with model acceleration methods such as quantization and distillation.
- Proficiency in C++, CUDA, Rust, or optimized Python for high-performance systems.
- Experience with distributed systems, Kubernetes, and multi-GPU inference.
- PhD in CS, Physics, Math, or equivalent practical experience in backend or ML systems.
- Professional fluency in English, both written and spoken.
Benefits
- Full U.S. visa and relocation support may be available for candidates interested in relocating to the San Francisco Bay Area.
