Senior Software Engineer — LLM Post-Training Platform

about 2 months ago

Bellevue, WA, USASenior

H1B Sponsor

Base Salary

$160k - $230k/yr

Responsibilities

Design and build full-stack solutions from public training APIs to GPU data planes.
Scale distributed systems for serverless GPU compute with multi-tenant scheduling and fault tolerance.
Drive end-to-end performance for training, inference, and RL loops under heavy load.
Productionize research techniques into reliable components for enterprise use.

Requirements

5+ years of experience building and shipping production ML systems.
Strong foundation in distributed systems and infrastructure, particularly on Kubernetes.
Familiarity with GPU and LLM infrastructure, including PyTorch and CUDA.
Proven ability to enhance system reliability, throughput, and cost efficiency.
BS in Computer Science or related field; MS/PhD is a plus.
Hands-on experience with LLM post-training is a bonus.

Tech Stack

Kubernetes PyTorch

Categories

AI & ML BackendData Engineering