SWE, Applied Evals

OpenAI

6 months ago

San Francisco, CA, USA

Mid Level / Senior

Base Salary

$230k - $325k/yr

Responsibilities

Define core evaluation signals for model improvement.
Design reliable and extendable agents, harnesses, and eval pipelines.
Prototype solutions with real workflows and create scalable feedback loops.
Connect evaluation signals to research and training systems.
Collaborate with engineering, research, and product teams on model deployment and measurement.
Build reusable systems and tools for company-wide contributions.

4+ years of experience in software engineering with a track record of shipping production systems.
Experience building AI agents or applications, including designing evals.
Familiarity with evaluation methods for LLMs and multi-agent workflows.
Knowledge of deep learning concepts or prior exposure to training models.
Strong communication skills for technical and non-technical audiences.
Motivated by high-impact collaboration and able to thrive in ambiguity.