GrepJob
Cohere

Member of Technical Staff, Model Efficiency

Cohere
Apply
6 months ago
Toronto, Canada +3 moreSenior / Staff+
H1B Sponsor

Responsibilities

  • Analyze model execution to identify performance bottlenecks.
  • Develop and implement optimizations for LLM inference efficiency.
  • Collaborate with modeling and systems teams to measure and ship improvements.
  • Work across the inference stack to enhance core performance metrics.
  • Experiment with advanced performance techniques, including GPU/CUDA optimizations.

Requirements

  • 5+ years of experience writing high-performance, production-quality code.
  • Strong programming skills in C++ or Python; Rust/Go is a plus.
  • Experience with large language models and the LLM inference ecosystem.
  • Ability to diagnose and resolve performance bottlenecks.
  • A strong bias for action, with a focus on shipping fast and measuring impact.

Benefits

  • Open and inclusive culture and work environment.
  • Weekly lunch stipend, in-office lunches, and snacks.
  • Full health and dental benefits, including mental health support.
  • 100% Parental Leave top-up for up to 6 months.
  • Personal enrichment benefits for arts, culture, fitness, and workspace improvement.
  • Remote-flexible work arrangements with offices in major cities.
  • 6 weeks of vacation (30 working days).

Tech Stack

Categories

AI & MLData Engineering