GrepJob
HeyGen

Software Engineer, AI Compute Infrastructure

HeyGen
Apply
5 months ago
Toronto, Canada +4 moreSenior / Staff+
H1B Sponsor

Responsibilities

  • Design and implement mechanisms to optimize GPU and cluster utilization for AI models.
  • Build scalable frameworks for managing large compute jobs and data processing.
  • Develop observability and visualization tools for performance diagnostics.
  • Collaborate with AI teams to integrate acceleration techniques into pipelines.
  • Champion the use of modern cloud and container technologies for system scaling.

Requirements

  • Bachelor's degree in Computer Science, Engineering, or related field.
  • 5+ years of experience in large-scale MLOps, AI infrastructure, or HPC systems.
  • Experience with data frameworks like Ray, Apache Spark, or LanceDB.
  • Strong proficiency in Python and C++ for infrastructure development.
  • Hands-on experience with orchestration frameworks like Kubernetes and Ray.
  • Familiarity with core ML frameworks such as PyTorch, TensorFlow, or JAX.

Benefits

  • Competitive salary and benefits package.
  • Dynamic and inclusive work environment.
  • Opportunities for professional growth and advancement.
  • Collaborative culture that values innovation and creativity.
  • Access to the latest technologies and tools.

Categories

AI & MLData EngineeringDevOps