GrepJob
Paytm

Staff Platform Engineer - AI Infrastructure

Paytm
Apply
about 6 hours ago
Toronto, Canada
Staff+

Responsibilities

  • Design and operate GPU infrastructure for model hosting, including provisioning and scheduling.
  • Build and scale model serving systems supporting real-time inference.
  • Implement multi-model routing for various modalities on shared infrastructure.
  • Own the model lifecycle from download to deployment and monitoring.
  • Drive inference optimization strategies including quantization and caching.
  • Build self-service infrastructure platforms for teams to provision resources.
  • Implement infrastructure-as-code at scale using tools like Terraform.
  • Build observability and reliability for inference systems.
  • Define platform standards and governance for resource management.
  • Lead architectural design and influence engineering direction.

Requirements

  • 8+ years of software engineering experience with 3+ years in infrastructure platforms or ML/AI infrastructure.
  • Deep experience with cloud infrastructure (AWS, GCP) and Kubernetes.
  • Hands-on experience with GPU workloads and model serving technologies.
  • Strong software engineering skills in Python, Go, or C++.
  • Experience with infrastructure-as-code tools like Terraform or Pulumi.
  • Experience designing self-service platforms or internal developer tooling.
  • Understanding of model optimization techniques.
  • Proven ability to lead complex cross-team technical initiatives.
  • Strong communication skills to influence technical direction.

Tech Stack

AWSC++GoGoogle Cloud PlatformKubernetesPythonTerraform

Categories

AI & MLData EngineeringDevOps