GrepJob
Zoox

Part-Time Student Worker – AI Validation and Benchmarking Engineer

Zoox
Apply
about 6 hours ago
Foster City, CA, USAIntern / Entry Level
H1B Sponsor

Responsibilities

  • Run and maintain the benchmark pipeline, analyzing results to identify routing errors and regressions.
  • Build and expand ground truth datasets for evaluating agent outputs.
  • Identify gaps in benchmark validation and support building a comprehensive evaluation infrastructure.
  • Develop new evaluation dimensions such as label accuracy and structured output correctness.
  • Investigate failure modes in agent outputs and collaborate with engineers for improvements.
  • Write scripts and tooling to automate data collection and metric reporting.
  • Document findings, track benchmark trends, and present results to the team.

Requirements

  • Currently enrolled in a B.S. or M.S. in Computer Science, Data Science, Engineering, or a related field.
  • Available for a minimum three-month assignment.
  • Able to commit to at least 20 hours per week.
  • Willing to work on-site at one of the office locations.
  • Must adhere to Zoox confidentiality requirements.

Tech Stack

Categories

AI & MLData Science