
Senior AI Engineer — Inference & Agent Systems
Arcana Analytics4 months ago
Responsibilities
- Drive time-to-first-token (TTFT) below 400ms for multi-step agent pipelines.
- Implement streaming optimization to deliver the first token to users while sub-agents are still running.
- Develop KV cache strategies, prompt compression, and dynamic context window management.
- Design and implement Plan-Execute-Synthesize pipelines for parallel execution of sub-agents.
- Build reliable orchestration using Temporal, including retries and partial failure recovery.
- Own the evaluation framework, including ground truth datasets and automated scoring pipelines.
- Conduct latency regression testing and design adversarial test cases for robustness.
- Optimize model serving and cold start processes, ensuring observability of all operations.
Requirements
- Proven experience building production systems that operate at meaningful scale.
- Strong background in inference pipelines with a focus on latency metrics.
- Experience with multi-step agent systems and understanding of their failure points.
- Ability to create evaluation harnesses and insights into effective ground truth datasets.
- Familiarity with streaming LLM responses and infrastructure for handling partial outputs.
- Proficiency in Go, Python, Temporal, Kafka, PostgreSQL, and Docker.
Tech Stack
Categories
AI & MLData Engineering