AI Engineer, Agents & Evaluation

5 months ago

San Francisco, CA, USAMid Level / Senior

Base Salary

$140k - $320k/yr

Responsibilities

Design and implement task-specific evaluations to improve agent quality.
Define tasks, curate datasets, and build evaluation harnesses for various agents.
Develop reusable frameworks for running evaluations at scale.
Investigate orchestration strategies for complex, multi-step tasks.
Experiment with post-training techniques to enhance model performance.
Run rigorous experiments and analyze results to inform model configurations.
Collaborate with cross-functional teams to align evaluations and platform primitives.

Requirements

MS or Ph.D. in a relevant field or equivalent practical experience.
Strong background in machine learning and large language models.
2–5 years of experience with LLM technology and evaluation strategies.
Proficiency in writing production-quality code, especially in Python.
Experience designing and running experiments in real-world settings.
Self-motivated and comfortable in high-ambiguity environments.
Strong communication skills to translate vague goals into testable setups.

Benefits

Significant equity in an early-stage, venture-backed startup.
Comprehensive health benefits including medical, dental, and vision.
Flexible PTO to recharge and maintain work-life balance.

Tech Stack

PythonTypeScript

Categories

AI & MLData ScienceTesting