13 days ago
Palo Alto, CA, USAMid Level
H1B Sponsor
Base Salary
$180k - $440k/yr
Responsibilities
- Scale synthetic coding data to trillions of tokens with large-scale docker verification.
- Distill the intelligence of flagship models into flash models through synthetic data generation.
- Optimize mid-training data mixtures to boost the ceiling for RL.
- Engineer long-context data recipes.
- Develop robust and diverse evaluation for mid-training checkpoints.
Requirements
- Expertise in ML and large model scaling, with familiarity across all kinds of scaling laws.
- Strong ability to design ML experiments.
- Familiarity with state-of-the-art techniques for curating AI training data for text, image, audio, and video modalities.
- Strong engineering abilities in Spark, Ray, and other frameworks for large-scale data processing.
Benefits
- Equity in the company.
- Comprehensive medical, vision, and dental coverage.
- Access to a 401(k) retirement plan.
- Short & long-term disability insurance.
- Life insurance and various other discounts and perks.
Tech Stack
Apache SparkDocker
Categories
AI & MLData Engineering