Machine Learning Engineer - Inference

about 2 months ago

San Francisco, CA, USAMid Level / Senior

H1B Sponsor

Base Salary

$160k - $230k/yr

Responsibilities

Design and build production systems for the Together AI inference engine.
Develop and optimize runtime inference services for large-scale AI applications.
Collaborate with researchers, engineers, product managers, and designers.
Conduct design and code reviews to ensure high standards of quality.
Create services, tools, and developer documentation for the inference engine.
Implement robust and fault-tolerant systems for data ingestion and processing.

Requirements

3+ years of experience writing high-performance, well-tested, production-quality code.
Proficiency with Python and PyTorch.
Experience in building high performance libraries and tooling.
Excellent understanding of low-level operating systems concepts.
Preferred: Knowledge of existing AI inference systems like TGI, vLLM, TensorRT-LLM, Optimum.
Preferred: Knowledge of AI inference techniques such as speculative decoding.
Preferred: Knowledge of CUDA/Triton programming.
Nice to have: Knowledge of Rust, Cython, and compilers.

Benefits

Competitive compensation and startup equity.
Health insurance and other competitive benefits.

Tech Stack

Python PyTorch Rust

Categories

AI & MLData Engineering