about 3 hours ago
New York, NY, USA +2 more
Mid Level
H1B Sponsor
Base Salary
$320k - $485k/yr
Responsibilities
- Design and run evaluations of Claude's capabilities, producing visualizations for stakeholders.
- Build and maintain a distributed evaluation execution platform for reliable performance.
- Manage dashboards for monitoring model health during training.
- Debug evaluation results during training runs and communicate findings under pressure.
- Enhance tools and workflows for researchers to implement evaluations.
- Collaborate with research teams to define metrics and interpret results.
- Conduct experiments to analyze the impact of various factors on evaluation results.
- Communicate evaluation results to internal and external audiences.
Requirements
- Strong Python programming skills, including experience with production or research infrastructure.
- Experience with distributed systems, data pipelines, or reliable infrastructure at scale.
- Excellent written and verbal communication skills, especially for non-specialist audiences.
- Ability to operate in an on-call or production-support capacity during live training.
- A commitment to the societal impacts of AI and a desire to ensure its safety and benefits.
Benefits
- Competitive compensation and benefits.
- Optional equity donation matching.
- Generous vacation and parental leave.
- Flexible working hours.
- Collaborative office space.
Tech Stack
Python
Categories
AI & MLData Science