Software Engineer, AI Compute Infrastructure

5 months ago

Toronto, Canada +4 moreSenior / Staff+

H1B Sponsor

Responsibilities

Design and implement mechanisms to optimize GPU and cluster utilization for AI models.
Build scalable frameworks for managing large compute jobs and data processing.
Develop observability and visualization tools for performance diagnostics.
Collaborate with AI teams to integrate acceleration techniques into pipelines.
Champion the use of modern cloud and container technologies for system scaling.

Requirements

Bachelor's degree in Computer Science, Engineering, or related field.
5+ years of experience in large-scale MLOps, AI infrastructure, or HPC systems.
Experience with data frameworks like Ray, Apache Spark, or LanceDB.
Strong proficiency in Python and C++ for infrastructure development.
Hands-on experience with orchestration frameworks like Kubernetes and Ray.
Familiarity with core ML frameworks such as PyTorch, TensorFlow, or JAX.

Benefits

Competitive salary and benefits package.
Dynamic and inclusive work environment.
Opportunities for professional growth and advancement.
Collaborative culture that values innovation and creativity.
Access to the latest technologies and tools.

Tech Stack

Apache Spark C++Kubernetes Python PyTorchTensorFlow

Categories

AI & MLData EngineeringDevOps