10 days ago
San Francisco, CA, USA or New York, NY, USAMid Level / Senior
Responsibilities
- Build and maintain robust multi-stage asynchronous workflows for data generation, training, and evaluations.
- Rationalize machine learning systems design and software architecture.
- Identify blockers and build scalable solutions for foundation models.
- Act as the liaison between research scientists and the infrastructure team.
Requirements
- At least two years of relevant industry experience.
- Highly fluent in PyTorch and JAX.
- Experience with asynchronous programming concepts.
- Strong views on library design with a focus on clean abstractions.
- Proven track record of well-documented code in observable artifacts like GitHub.
- General understanding of scalable and reliable machine learning systems.
