about 17 hours ago
San Francisco, CA, USA or New York, NY, USAMid Level / Senior
H1B Sponsor
Base Salary
$320k - $485k/yr
Responsibilities
- Build and own the evaluation harness for an agentic investigation system.
- Construct high-quality eval datasets representing real-world misuse.
- Measure agent performance end-to-end and drive improvements.
- Analyze coverage to identify measurement gaps and evolve evaluations.
- Productionize successful research into regression and release pipelines.
- Build tooling for policy experts to run evaluations independently.
- Construct RL environments to enhance safety investigation capabilities.
Requirements
- Proficiency in Python and comfort working across the stack.
- Experience building and maintaining data pipelines.
- Experience with LLMs and understanding their capabilities and failure modes.
- Strong data analysis skills to draw insights from large datasets.
- Ability to transition between research prototyping and production-quality code.
- Ability to translate ambiguous problems into concrete, testable experiments.
Benefits
- Competitive compensation and benefits.
- Optional equity donation matching.
- Generous vacation and parental leave.
- Flexible working hours.
- Collaborative office space.