GrepJob
HeyGen

Tech Lead, AI Compute Infrastructure

HeyGen
Apply
5 months ago
Toronto, Canada +4 moreSenior / Staff+
H1B Sponsor

Responsibilities

  • Design and implement mechanisms to optimize GPU and cluster utilization across thousands of devices.
  • Build scalable frameworks for launching and managing large-scale AI jobs.
  • Develop observability, tracing, and visualization tools for compute clusters.
  • Collaborate with AI researchers to integrate acceleration techniques into production pipelines.
  • Champion the adoption of modern cloud and container technologies for distributed systems.

Requirements

  • Bachelor's degree in Computer Science, Engineering, or a related field.
  • 5+ years of experience in large-scale MLOps, AI infrastructure, or HPC systems.
  • Experience with data frameworks like Ray, Apache Spark, and LanceDB.
  • Strong proficiency in Python and a high-performance language such as C++.
  • Hands-on experience with orchestration and distributed computing frameworks like Kubernetes and Ray.
  • Experience with core ML frameworks such as PyTorch, TensorFlow, or JAX.

Benefits

  • Competitive salary and benefits package.
  • Dynamic and inclusive work environment.
  • Opportunities for professional growth and advancement.
  • Collaborative culture that values innovation and creativity.
  • Access to the latest technologies and tools.

Categories

AI & MLData EngineeringDevOps