GrepJob
Exa

Software Engineer, Distributed Data Systems

Exa
Apply
7 months ago
San Francisco, CA, USAMid Level / Senior
H1B Sponsor

Base Salary

$180k - $350k/yr

Responsibilities

  • Design a lakehouse architecture that handles over 100 PB of web crawl data.
  • Build streaming pipelines that process billions of documents daily for real-time indexing.
  • Architect the data layer for embedding training infrastructure on Ray.
  • Scale ClickHouse deployment to manage analytical queries across petabytes of search logs.

Requirements

  • Deep understanding of lakehouse architectures like Delta Lake, Iceberg, and Hudi.
  • Experience in building and operating large-scale distributed data processing pipelines.
  • Hands-on experience with streaming data systems such as Kafka or Flink.
  • Familiarity with Ray, Spark, or ClickHouse at production scale.
  • Focus on reliability to build systems that minimize operational issues.

Benefits

  • Premium healthcare benefits including medical, dental, and vision.
  • Fertility benefits offered to all employees.
  • Monthly wellness stipend provided.

Tech Stack

Apache FlinkApache KafkaApache SparkClickHouseRust

Categories

AI & MLData Engineering