Member of Technical Staff, Model Efficiency

8 months ago

Toronto, Canada +3 moreSenior / Staff+

H1B Sponsor

Responsibilities

Analyze model execution to identify performance bottlenecks.
Develop and implement optimizations for LLM inference efficiency.
Collaborate with modeling and systems teams to measure and ship improvements.
Work across the inference stack to enhance core performance metrics.
Experiment with advanced performance techniques, including GPU/CUDA optimizations.

Requirements

5+ years of experience writing high-performance, production-quality code.
Strong programming skills in C++ or Python; Rust/Go is a plus.
Experience with large language models and the LLM inference ecosystem.
Ability to diagnose and resolve performance bottlenecks.
A strong bias for action, with a focus on shipping fast and measuring impact.

Benefits

Open and inclusive culture and work environment.
Weekly lunch stipend, in-office lunches, and snacks.
Full health and dental benefits, including mental health support.
100% Parental Leave top-up for up to 6 months.
Personal enrichment benefits for arts, culture, fitness, and workspace improvement.
Remote-flexible work arrangements with offices in major cities.
6 weeks of vacation (30 working days).

Tech Stack

C++Go Python Rust

Categories

AI & MLData Engineering