about 2 months ago
Mountain View, CA, USAMid Level / Senior
H1B Sponsor
Base Salary
$200k - $300k/yr
Responsibilities
- Design and curate evaluation datasets for reliable assistant behavior coverage.
- Build and maintain large-scale evaluation pipelines for measuring assistant quality.
- Develop LLM-powered judges to score metrics like correctness and response quality.
- Evaluate new models and product changes to provide quality signals before launch.
- Create observability infrastructure for AI agents to enhance behavior inspection.
- Utilize eval results and customer feedback to drive improvements in assistant behavior.
- Collaborate with engineers to integrate evaluations into the product shipping process.
Requirements
- 2+ years of software engineering experience with strong coding skills.
- Strong backend fundamentals in Go and Python, with experience in distributed data pipelines.
- Experience in LLM evaluation, reinforcement learning, or natural language processing.
- Analytically rigorous with a focus on predicting real user experience from metrics.
- Ability to thrive in a customer-focused, cross-functional team environment.
- A strong commitment to quality in both systems and product outcomes.
Benefits
- Comprehensive benefits package including medical, vision, and dental coverage.
- Generous time-off policy and 401k plan contributions.
- Home office improvement stipend and annual education and wellness stipends.
- Vibrant company culture with regular events and daily healthy lunches.
