Senior AI Engineer, Agentic Evaluation & V&V

about 2 months ago

Remote, WorldwideSenior

Base Salary

$150k - $250k/yr

Responsibilities

Extend and maintain Slingshot’s V&V SDK and evaluation framework for agentic AI systems.
Design and implement agent-level and end-to-end evaluations, including benchmark scenarios and scoring logic.
Build benchmark scenarios and tooling to measure planning, reasoning, and operational performance.
Translate astrodynamics and mission-domain concepts into executable evaluation scenarios.
Develop reusable SDK interfaces and evaluation utilities that connect V&V systems and agent workflows.
Define and apply metrics for capability evaluation and failure analysis.
Partner with cross-functional teams to identify evaluation needs.
Contribute to best practices for evaluating complex, autonomous AI systems.
Uphold strong engineering standards through testing and documentation.

6+ years of experience in software engineering, machine learning engineering, or applied AI.
Strong Python engineering skills with experience building SDKs or evaluation tooling.
Experience designing evaluation frameworks, benchmarks, or test harnesses for AI/ML systems.
Ability to analyze system behavior and evaluate performance in complex systems.
Familiarity with modern agent frameworks and orchestration patterns.
Experience working in cross-functional, multidisciplinary teams.
Strong written and verbal communication skills.
Bachelor’s degree in a relevant science or engineering field.
Must be a U.S. citizen and eligible for a government security clearance.