GrepJob
Sema4.ai

Staff Engineer, AI Evals

Sema4.ai
Apply
3 months ago
Atlanta, GA, USAStaff+

Responsibilities

  • Design, build, and operate the core evaluation infrastructure for LLMs and agents.
  • Translate fuzzy goals into concrete, measurable signals for agent performance.
  • Solve complex evaluation problems related to multi-step agents and evolving tasks.
  • Use evaluation results to guide architectural decisions and model selection.
  • Participate in design reviews and set technical standards for evaluation rigor.

Requirements

  • 7+ years of software engineering experience, including 2+ years in AI/ML systems.
  • Deep experience with backend systems in Python, including data pipelines.
  • Hands-on experience evaluating LLM-based systems.
  • Strong intuition for metrics, experimentation, and failure analysis.
  • Excellent communication skills for collaboration with diverse stakeholders.
  • A high-ownership mindset regarding system integrity and decision-making.

Tech Stack

Categories