6 days ago
Base Salary
$180k - $250k/yr
Responsibilities
- Help fal maintain its frontier position on model performance for generative media models.
- Design and implement novel approaches to model serving architecture on top of our in-house inference engine.
- Develop performance monitoring and profiling tools to identify bottlenecks and optimization opportunities.
- Work closely with our Applied ML team and customers to ensure their workloads benefit from our accelerator.
Requirements
- Strong foundation in systems programming with expertise in identifying and fixing bottlenecks.
- Deep understanding of cutting edge ML infrastructure stack including model compilation, quantization, and serving architectures.
- Fundamental view of the underlying hardware, particularly Nvidia based systems.
- Proficient in Triton or willingness to learn with comparable experience in lower-level accelerator programming.
- Familiarity with multi-dimensional model parallelism techniques.
- Knowledge of internals of Ring Attention, FA3, FusedMLP implementations.
Benefits
- Interesting and challenging work.
- Competitive salary and equity.
- A lot of learning and growth opportunities.
- Relocation assistance to San Francisco.
- Health, dental, and vision insurance (US).
- Regular team events and offsite.
Tech Stack
PyTorch
Categories
AI & MLData Engineering
