about 6 hours ago
Toronto, Canada
Staff+
Responsibilities
- Design and operate GPU infrastructure for model hosting, including provisioning and scheduling.
- Build and scale model serving systems supporting real-time inference.
- Implement multi-model routing for various modalities on shared infrastructure.
- Own the model lifecycle from download to deployment and monitoring.
- Drive inference optimization strategies including quantization and caching.
- Build self-service infrastructure platforms for teams to provision resources.
- Implement infrastructure-as-code at scale using tools like Terraform.
- Build observability and reliability for inference systems.
- Define platform standards and governance for resource management.
- Lead architectural design and influence engineering direction.
Requirements
- 8+ years of software engineering experience with 3+ years in infrastructure platforms or ML/AI infrastructure.
- Deep experience with cloud infrastructure (AWS, GCP) and Kubernetes.
- Hands-on experience with GPU workloads and model serving technologies.
- Strong software engineering skills in Python, Go, or C++.
- Experience with infrastructure-as-code tools like Terraform or Pulumi.
- Experience designing self-service platforms or internal developer tooling.
- Understanding of model optimization techniques.
- Proven ability to lead complex cross-team technical initiatives.
- Strong communication skills to influence technical direction.
Tech Stack
AWSC++GoGoogle Cloud PlatformKubernetesPythonTerraform
Categories
AI & MLData EngineeringDevOps