27 days ago
Santa Clara, CA, USASenior / Mid Level
Base Salary
$124k - $210k/yr
Responsibilities
- Architect and build scalable, end-to-end data pipelines for PB-scale raw data.
- Evolve data storage solutions using Apache Iceberg and Lance.
- Optimize data loading and pre-fetching strategies for large-scale training.
- Support the transition of raw vehicle logs into model-ready training sets.
Requirements
- BS/MS/PhD in Computer Science or a related field.
- 3-5 years of industry experience.
- Proficient in Python, C++, or Java with a focus on concurrent programming.
- Hands-on experience with distributed processing frameworks like Ray or Spark.
- Familiarity with Data Lakehouse concepts and technologies.