GrepJob
Guild.ai

AI Engineer, Agents & Evaluation

Guild.ai
Apply
5 months ago
San Francisco, CA, USAMid Level / Senior

Base Salary

$140k - $320k/yr

Responsibilities

  • Design and implement task-specific evaluations to improve agent quality.
  • Define tasks, curate datasets, and build evaluation harnesses for various agents.
  • Develop reusable frameworks for running evaluations at scale.
  • Investigate orchestration strategies for complex, multi-step tasks.
  • Experiment with post-training techniques to enhance model performance.
  • Run rigorous experiments and analyze results to inform model configurations.
  • Collaborate with cross-functional teams to align evaluations and platform primitives.

Requirements

  • MS or Ph.D. in a relevant field or equivalent practical experience.
  • Strong background in machine learning and large language models.
  • 2–5 years of experience with LLM technology and evaluation strategies.
  • Proficiency in writing production-quality code, especially in Python.
  • Experience designing and running experiments in real-world settings.
  • Self-motivated and comfortable in high-ambiguity environments.
  • Strong communication skills to translate vague goals into testable setups.

Benefits

  • Significant equity in an early-stage, venture-backed startup.
  • Comprehensive health benefits including medical, dental, and vision.
  • Flexible PTO to recharge and maintain work-life balance.

Tech Stack

PythonTypeScript

Categories

AI & MLData ScienceTesting