about 3 hours ago
San Francisco, CA, USA or New York, NY, USAMid Level / Senior
H1B Sponsor
Base Salary
$214k - $300k/yr
Responsibilities
- Build and improve scalable eval runners and harnesses for local, CI, and scheduled runs.
- Create better templates, fixtures, debugging tools, and workflows for high-signal evals.
- Develop and maintain benchmark and dataset tooling, including curation pipelines and regression tracking.
- Enhance reliability and observability for eval execution, focusing on retries and failure triage.
- Collaborate with AI product, AI platform, and infrastructure teams to integrate evals into workflows.
Requirements
- Strong software engineering fundamentals and experience shipping production systems.
- Proficiency with TypeScript/Node and/or Python.
- Experience building reliable systems in distributed environments.
- Comfort with data pipelines, including batch processing and data quality.
- Practical experience designing measurement or evaluation systems.
