GrepJob
Arcana Analytics

Senior AI Engineer — Inference & Agent Systems

Arcana Analytics
Apply
4 months ago
Bengaluru, IndiaSenior
H1B Sponsor

Responsibilities

  • Drive time-to-first-token (TTFT) below 400ms for multi-step agent pipelines.
  • Implement streaming optimization to deliver the first token to users while sub-agents are still running.
  • Develop KV cache strategies, prompt compression, and dynamic context window management.
  • Design and implement Plan-Execute-Synthesize pipelines for parallel execution of sub-agents.
  • Build reliable orchestration using Temporal, including retries and partial failure recovery.
  • Own the evaluation framework, including ground truth datasets and automated scoring pipelines.
  • Conduct latency regression testing and design adversarial test cases for robustness.
  • Optimize model serving and cold start processes, ensuring observability of all operations.

Requirements

  • Proven experience building production systems that operate at meaningful scale.
  • Strong background in inference pipelines with a focus on latency metrics.
  • Experience with multi-step agent systems and understanding of their failure points.
  • Ability to create evaluation harnesses and insights into effective ground truth datasets.
  • Familiarity with streaming LLM responses and infrastructure for handling partial outputs.
  • Proficiency in Go, Python, Temporal, Kafka, PostgreSQL, and Docker.

Categories

AI & MLData Engineering