LLM Inference Frameworks and Optimization Engineer

4 days ago

Amsterdam, Netherlands +2 moreMid Level / Senior

H1B Sponsor

Base Salary

$160k - $230k/yr

Responsibilities

Design and develop fault-tolerant, high-concurrency distributed inference engines.
Implement and optimize distributed inference strategies for high-performance serving.
Apply CUDA graph optimizations and PyTorch-based compilation to enhance efficiency.
Collaborate with hardware teams on performance bottleneck analysis.
Work with AI researchers to develop efficient model execution plans.

Requirements

3+ years of experience in deep learning inference frameworks or high-performance computing.
Familiarity with at least one LLM inference framework like TensorRT-LLM or vLLM.
Proficient in Python and C++/CUDA for high-performance deep learning inference.
Deep understanding of Transformer architectures and LLM optimization.
Strong analytical problem-solving skills and excellent collaboration abilities.

Benefits

Competitive compensation and startup equity.
Health insurance and other competitive benefits.

Tech Stack

C++KubernetesPythonPyTorch

Categories

AI & MLData Engineering