Senior ML Systems Engineer, Frameworks & Tooling

7 months ago

Toronto, Canada +5 moreSenior

H1B Sponsor

Responsibilities

Strong engineering experience in large-scale distributed training or HPC systems.
Deep familiarity with JAX internals and distributed training libraries.
Experience with multi-node cluster orchestration tools like Slurm or Kubernetes.
Comfort debugging performance issues across CUDA/NCCL and data pipelines.
Experience with containerized environments such as Docker.
A track record of building tools that enhance developer velocity for ML teams.
Strong collaboration skills to work with infra, research, and deployment teams.

An open and inclusive culture and work environment.
Weekly lunch stipend, in-office lunches, and snacks.
Full health and dental benefits, including mental health support.
100% Parental Leave top-up for up to 6 months.
Personal enrichment benefits for arts, culture, fitness, and workspace improvement.
Remote-flexible work options with offices in major cities.
6 weeks of vacation (30 working days).