
LLM Inference Frameworks and Optimization Engineer
Together AI4 days ago
Amsterdam, Netherlands +2 moreMid Level / Senior
H1B Sponsor
Base Salary
$160k - $230k/yr
Responsibilities
- Design and develop fault-tolerant, high-concurrency distributed inference engines.
- Implement and optimize distributed inference strategies for high-performance serving.
- Apply CUDA graph optimizations and PyTorch-based compilation to enhance efficiency.
- Collaborate with hardware teams on performance bottleneck analysis.
- Work with AI researchers to develop efficient model execution plans.
Requirements
- 3+ years of experience in deep learning inference frameworks or high-performance computing.
- Familiarity with at least one LLM inference framework like TensorRT-LLM or vLLM.
- Proficient in Python and C++/CUDA for high-performance deep learning inference.
- Deep understanding of Transformer architectures and LLM optimization.
- Strong analytical problem-solving skills and excellent collaboration abilities.
Benefits
- Competitive compensation and startup equity.
- Health insurance and other competitive benefits.