5 days ago
Remote, United States
Staff+
H1B Sponsor
Base Salary
$188k - $200k/yr
Responsibilities
- Architect and scale AI infrastructure for quality and reliability.
- Design an end-to-end AI evaluation framework with offline evaluations and human feedback.
- Define performance metrics and build datasets to prevent regressions.
- Architect reusable agent infrastructure using frameworks like LangGraph.
- Build and scale production-grade AI infrastructure with strong reliability.
- Make build-vs-buy decisions for LLM providers and evaluation tooling.
- Own projects end-to-end from scope to delivery.
- Set technical direction for agent quality across engineering teams.
- Lead discussions on AI system design and evaluation methodology.
- Mentor engineers and enhance technical communication.
Requirements
- 8+ years of experience in production-level code, with 5+ years in AI/ML systems.
- Deep production experience with LLM systems and agentic systems.
- Strong command of AI evaluation methodology and statistical experimentation.
- Proficient in production-grade Python for maintainable systems.
- Experience with LangGraph or similar frameworks and LLM observability tools.
- Familiarity with vector databases and retrieval system design.
- Experience operating AI systems in cloud environments like AWS.
- Engaged in AI research and industry trends.
Benefits
- Medical, dental, and vision insurance.
- Life, AD&D, and disability insurance.
- Paid parental leave and paid time off including holidays.
- Commuter and parking accounts.
- Lunches in the office and internet/phone stipend.
- 401(k) retirement plan and financial planning support.
- Learning and development budget.
Tech Stack
AWSDatadogMLflowPythonTypeScript
Categories
AI & MLData Science
