7 months ago
Base Salary
$180k - $350k/yr
Responsibilities
- Design a lakehouse architecture that handles over 100 PB of web crawl data.
- Build streaming pipelines that process billions of documents daily for real-time indexing.
- Architect the data layer for embedding training infrastructure on Ray.
- Scale ClickHouse deployment to manage analytical queries across petabytes of search logs.
Requirements
- Deep understanding of lakehouse architectures like Delta Lake, Iceberg, and Hudi.
- Experience in building and operating large-scale distributed data processing pipelines.
- Hands-on experience with streaming data systems such as Kafka or Flink.
- Familiarity with Ray, Spark, or ClickHouse at production scale.
- Focus on reliability to build systems that minimize operational issues.
Benefits
- Premium healthcare benefits including medical, dental, and vision.
- Fertility benefits offered to all employees.
- Monthly wellness stipend provided.
Tech Stack
Categories
AI & MLData Engineering
