about 4 hours ago
London, United Kingdom +3 more
Mid Level / Senior
Base Salary
$230k - $325k/yr
Responsibilities
- Design and iterate on agent behaviors for real-world coding tasks.
- Develop and run evaluations to measure agent performance and identify failure modes.
- Enhance performance through prompting, tool-use strategies, and experimentation.
- Analyze production failures to improve robustness and reliability.
- Build feedback loops and data systems for better real-task data.
- Collaborate with product teams to shape user-facing agent experiences.
- Define success criteria for agents completing complex tasks.
Requirements
- Experience building or shipping machine learning or LLM-powered products.
- Strong proficiency in Python and familiarity with modern ML tooling.
- Experience with model evaluation, fine-tuning, or prompt design.
- Ability to think in terms of systems and user outcomes.
- Enjoyment in debugging real-world failures and implementing improvements.
- Desire to work on systems that translate research into user-friendly applications.
Tech Stack
Python
Categories
AI & MLData Science