Member of Technical Staff, RL Research & Environments

over 1 year ago

H1B Sponsor

Base Salary

$200k - $550k/yr

Responsibilities

Design and build post-training datasets using synthetic generation and targeted data collection.
Implement filtering, scoring, and mixture strategies for RL and post-training corpora.
Build and maintain evaluation frameworks to identify long-context failure modes.
Design reward signals and training environments for targeted capability improvements.
Run ablations across data sources, reward designs, and long-horizon task structures.
Improve reliability and observability of post-training data and environment pipelines.
Collaborate closely with Product and Research to translate capability goals into measurable iteration cycles.

Strong software engineering fundamentals.
Experience building or operating large-scale data or ML systems.
Ability to design and interpret experiments that measure model behavior changes.
Comfort working at the intersection of ML, data systems, and infrastructure.
Strong attention to data quality and evaluation rigor.
Track record of owning experimental or production systems end-to-end.