about 2 hours ago
Base Salary
$295k - $445k/yr
Responsibilities
- Design and run experiments to improve model behavior in API and power-user workflows.
- Build evaluations and environments based on real developer workflows.
- Transform observed model failures into training data and hypotheses.
- Partner with users to identify behavior gaps and implement post-training interventions.
- Own end-to-end model behavior projects from analysis to launch readiness.
- Develop feedback loops using user traces and API patterns to identify model gaps.
- Decide on agentic capabilities and behavioral fixes for major model runs.
- Debug failures in shipped models by analyzing traces and evals.
- Work on early-training and alignment interventions to shape agent behavior.
- Improve large-scale training processes for reliability and efficiency.
Requirements
- Strong technical fundamentals in ML, software engineering, or applied research.
- Hands-on experience with LLMs, post-training, and production ML systems.
- Ability to form concrete hypotheses about model behavior from evaluations.
- Excitement for solving ambiguous capability problems with noisy signals.
- Deep care for developer and expert-user experience in real workflows.
- Comfortable working across research, product, and infrastructure teams.
- Willingness to build systems and processes as needed by the team.
- Desire to train and ship models that enhance usability for various users.