about 1 month ago
San Francisco, CA, USAMid Level / Senior
Base Salary
$200k - $275k/yr
Responsibilities
- Build in-house tooling to support post-training of AI models.
- Work across the stack on systems-level concepts like Kubernetes and networking.
- Employ various techniques to enhance model efficiency and quality.
- Collaborate with researchers to derive specifications and solve complex problems.
- Profile and improve performance of distributed GPU programs.
Requirements
- Deep understanding of modern ML techniques and tools for training transformers.
- Advanced experience in a tensor/array computation library like PyTorch or TensorFlow.
- Detailed understanding of transformer training parallelism strategies.
- Experience profiling and improving performance of distributed GPU programs.
- Familiarity with HPC and distributed computing platforms like Slurm and Kubernetes.
- Solid fundamentals in operating systems concepts.
Benefits
- Competitive compensation, including meaningful equity.
- 100% coverage of medical, dental, and vision insurance for employee and dependents.
- Flexible PTO policy including company-wide Winter Break.
- Paid parental leave.
- Fertility and family-building stipend through Carrot.
- Company-facilitated 401(k).
- Exposure to a variety of ML startups for learning and networking opportunities.
