Forward Deployed Engineer (Inference & Post-Training)

4 days ago

San Francisco, CA, USASenior / Staff+

H1B Sponsor

Base Salary

$270k - $300k/yr

Responsibilities

Select, configure, and optimize inference engines based on hardware and workload profiles.
Develop configuration updates to enhance POCs and optimize customer deployments.
Drive hands-on RL training runs and guide customers through post-training pipelines.
Act as the primary technical point of contact for strategic accounts.
Establish alignment with customers during onboarding to improve time-to-value.
Influence the software and model roadmap by providing field insights.

Requirements

5+ years in a technical role focused on inference systems or post-training workflows.
Expert-level experience with inference engines like vLLM and TensorRT-LLM.
Deep knowledge of KV cache tuning, speculative decoding, and quantization techniques.
Hands-on experience with fine-tuning and post-training pipelines.
Broad knowledge of state-of-the-art open-source models and their applications.
Strong Python skills and comfort in production environments.

Benefits

Competitive compensation and startup equity.
Health insurance and other benefits.
Flexibility in remote work arrangements.

Tech Stack

Categories

AI & MLData Science