Machine Learning Engineer, LLM Evals & Observability

2 months ago

H1B Sponsor

Base Salary

$200k - $300k/yr

Responsibilities

Design and curate evaluation datasets for reliable assistant behavior coverage.
Build and maintain large-scale evaluation pipelines measuring assistant quality.
Develop LLM-powered judges to score metrics like correctness and response quality.
Evaluate new models and product changes to provide quality signals before launch.
Create observability infrastructure for AI agents to inspect behavior.
Utilize eval results and customer feedback to drive improvements in assistant behavior.
Collaborate with engineers to integrate evaluations into the product shipping process.

2+ years of software engineering experience with strong coding skills.
Strong backend fundamentals in Go and Python; comfortable with distributed data pipelines.
Experience with LLM evaluation, reinforcement learning, or natural language processing.
Analytically rigorous with a focus on predicting real user experience.
Ability to thrive in a customer-focused, cross-functional team environment.
A strong commitment to quality in both systems and product.