GrepJob
Together AI

LLM Inference Frameworks and Optimization Engineer

Together AI
Apply
4 days ago
Amsterdam, Netherlands +2 moreMid Level / Senior
H1B Sponsor

Base Salary

$160k - $230k/yr

Responsibilities

  • Design and develop fault-tolerant, high-concurrency distributed inference engines.
  • Implement and optimize distributed inference strategies for high-performance serving.
  • Apply CUDA graph optimizations and PyTorch-based compilation to enhance efficiency.
  • Collaborate with hardware teams on performance bottleneck analysis.
  • Work with AI researchers to develop efficient model execution plans.

Requirements

  • 3+ years of experience in deep learning inference frameworks or high-performance computing.
  • Familiarity with at least one LLM inference framework like TensorRT-LLM or vLLM.
  • Proficient in Python and C++/CUDA for high-performance deep learning inference.
  • Deep understanding of Transformer architectures and LLM optimization.
  • Strong analytical problem-solving skills and excellent collaboration abilities.

Benefits

  • Competitive compensation and startup equity.
  • Health insurance and other competitive benefits.

Tech Stack

C++KubernetesPythonPyTorch

Categories

AI & MLData Engineering