about 6 hours ago
Foster City, CA, USAIntern / Entry Level
H1B Sponsor
Responsibilities
- Run and maintain the benchmark pipeline, analyzing results to identify routing errors and regressions.
- Build and expand ground truth datasets for evaluating agent outputs.
- Identify gaps in benchmark validation and support building a comprehensive evaluation infrastructure.
- Develop new evaluation dimensions such as label accuracy and structured output correctness.
- Investigate failure modes in agent outputs and collaborate with engineers for improvements.
- Write scripts and tooling to automate data collection and metric reporting.
- Document findings, track benchmark trends, and present results to the team.
Requirements
- Currently enrolled in a B.S. or M.S. in Computer Science, Data Science, Engineering, or a related field.
- Available for a minimum three-month assignment.
- Able to commit to at least 20 hours per week.
- Willing to work on-site at one of the office locations.
- Must adhere to Zoox confidentiality requirements.