Staff AI Platform Engineer - Inference & Agentic Systems

about 2 months ago

Toronto, CanadaStaff+

Responsibilities

Build and operate multi-model serving across modalities on shared infrastructure.
Own the model lifecycle: download, deploy, serve, monitor, update, and swap.
Drive inference optimization including latency, throughput, and cost strategies.
Architect and build the Agentic AI Platform for autonomous agents.
Design multi-agent coordination systems for complex workflows.
Build robust tool-use infrastructure for safe agent interactions.
Implement workflow automation for multi-step business and engineering tasks.
Develop evaluation and observability frameworks for agent behavior.
Define technical direction and architecture for agentic systems.
Mentor engineers and contribute to best practices for agent system design.

8+ years of software engineering experience, with 3+ years in AI systems or LLM applications.
Strong understanding of LLM-based agent architectures.
Experience building highly reliable distributed systems.
Proficiency in Python and experience with modern LLM APIs or open-source models.
Experience with model serving technologies.
Understanding of distributed systems and cloud platforms.
Strong understanding of security risks in agentic systems.
Demonstrated experience leading complex technical initiatives.
Strong written and verbal communication skills.