12 days ago
Santa Clara, CA, USASenior / Staff+
Base Salary
$203k - $344k/yr
Responsibilities
- Architect and build scalable, end-to-end data pipelines for PB-scale raw data.
- Evolve data storage solutions using Apache Iceberg and Lance.
- Optimize data loading and pre-fetching strategies for large-scale training.
- Support the transition of foundation model data into actionable training sets.
Requirements
- BS/MS/PhD in Computer Science or a related field.
- 5-8+ years of industry experience.
- Proficient in Python, C++, or Java with a focus on concurrent programming.
- Hands-on experience with distributed processing frameworks like Ray or Spark.
- Familiarity with Data Lakehouse concepts and technologies like Iceberg and Lance.
Benefits
- A fun, supportive, and engaging work environment.
- Access to infrastructures and computational resources.
- Opportunity to work on cutting-edge technologies.
- Chance to make a significant impact on autonomous driving.
- Competitive compensation package including snacks, lunches, and fun activities.