Scale AI

Machine Learning Systems Research Engineer, Agent Post-training - Enterprise GenAI

Scale AI

Apply
4 months ago
New York, NY, USA or San Francisco, CA, USA
Mid Level / Senior
H1B Sponsor

Base Salary

$181k - $315k/yr

Responsibilities

  • Build, profile, and optimize the training and inference framework.
  • Post-train state-of-the-art models to define stable post-training recipes.
  • Collaborate with ML teams to accelerate research and development.
  • Create a next-gen agent training algorithm for multi-agent/multi-tool rollouts.

Requirements

  • 1-3 years of LLM training experience in a production environment.
  • Passion for system optimization.
  • Experience with post-training methods like RLHF/RLVR and algorithms like PPO/GRPO.
  • Ability to operate the architecture of modern GPU clusters.
  • Experience with multi-node LLM training and inference.
  • Strong software engineering skills, proficient in CUDA, PyTorch, and transformers.
  • Strong written and verbal communication skills.
  • PhD or Masters in Computer Science or a related field.

Benefits

  • Comprehensive health, dental, and vision coverage.
  • Retirement benefits.
  • Learning and development stipend.
  • Generous PTO.
  • Potential commuter stipend.

Tech Stack

PyTorch

Categories

AI & MLData Science