about 3 hours ago
Responsibilities
- Design and operate distributed storage systems for large-scale batch workloads.
- Build and maintain an open source, modern data platform.
- Optimize utilization of storage resources and improve reliability of storage systems.
- Collaborate with teams to understand workload requirements and enhance platform capabilities.
- Contribute to platform tooling, automation, and CI/CD workflows.
Requirements
- 7+ years of experience in building and operating distributed storage systems or modern data platforms.
- Experience with streaming platforms like Kafka or Pulsar.
- Fluent in Python and SQL, with experience in Trino and Apache Spark.
- Knowledge of table formats such as Iceberg, Delta Lake, Hudi, and Xtable.
- Experience with RDBMS optimization (Postgres, MySQL).
- Strong debugging and problem-solving skills in complex distributed systems.
- Ability to collaborate across teams and communicate technical concepts clearly.
Tech Stack
Categories
AI & MLData Engineering