Staff Software Engineer, ML Performance & Systems

2 months ago

San Francisco, CA, USA Staff+

H1B Sponsor

Base Salary

$180k - $250k/yr

Responsibilities

Help fal maintain its frontier position on model performance for generative media models.
Design and implement novel approaches to model serving architecture on top of our in-house inference engine.
Develop performance monitoring and profiling tools to identify bottlenecks and optimization opportunities.
Work closely with our Applied ML team and customers to ensure their workloads benefit from our accelerator.

Requirements

Strong foundation in systems programming with expertise in identifying and fixing bottlenecks.
Deep understanding of cutting edge ML infrastructure stack including model compilation, quantization, and serving architectures.
Fundamental view of the underlying hardware, particularly Nvidia based systems.
Proficient in Triton or willingness to learn with comparable experience in lower-level accelerator programming.
Familiarity with multi-dimensional model parallelism techniques.
Knowledge of internals of Ring Attention, FA3, FusedMLP implementations.

Benefits

Interesting and challenging work.
Competitive salary and equity.
A lot of learning and growth opportunities.
Relocation assistance to San Francisco.
Health, dental, and vision insurance (US).
Regular team events and offsite.

Tech Stack

Categories

AI & MLData Engineering