3 days ago
Base Salary
$180k - $250k/yr
Responsibilities
- Contribute to the multi-modal data strategy across pre-training and post-training.
- Design and build scalable, high-throughput data pipelines for text, audio, and video.
- Partner with research and inference teams to co-design data systems with training infrastructure.
- Drive rigorous standards for data quality with feedback loops between dataset characteristics and model behavior.
- Identify and integrate novel datasets from external vendors and partners.
Requirements
- Hands-on experience with ML data infrastructure, including training data pipelines and dataset versioning.
- Working knowledge of multimodal data, particularly audio formats and preprocessing.
- Strong modern engineering execution with clean, well-tested code.
- Track record of driving significant technical projects end-to-end in a fast-moving environment.
- Familiarity with building and evaluating datasets for generative models.
Benefits
- Competitive base salary alongside an attractive equity package.
- Monthly commuter allowance to assist with travel to the office.
- Flexible PTO policy allowing ample time to recharge.
- Daily meals and snacks provided.
- Unique perks like your own personal Yoshi.
Categories
AI & MLData Engineering
