6 months ago
San Francisco, CA, USAMid Level
Responsibilities
- Design and operate high-performance inference and training infrastructure.
- Build reliable systems for deploying and scaling ML workloads globally.
- Work on GPU scheduling and distributed systems.
- Optimize performance and cost across compute, networking, and storage layers.
- Collaborate with engineers to enhance the capabilities of small models.
Requirements
- 2+ years of experience writing high-quality production code.
- Strong experience with cloud infrastructure (AWS, GCP, Azure, or equivalent).
- Experience with data science and systems optimization.
- Familiarity with ML infrastructure and GPUs is a plus.
- Willingness to work out of the SF office in FiDi.
