about 2 hours ago
Bengaluru, India
Senior / Mid Level
H1B Sponsor
Responsibilities
- Design and implement foundational frameworks for ingestion, orchestration, schema validation, and metadata management.
- Build robust, scalable pipelines for Change Data Capture (CDC) using Debezium integrated with Kafka and Spark.
- Optimize data serving layers powered by Trino, including metadata syncing, security filtering, and performance tuning.
- Partner with SRE and Infra teams to build autoscaling, self-healing, and cost-optimized Spark jobs on AWS EMR.
- Implement observability features — logs, metrics, alerts — for critical platform services and data pipelines.
- Define and enforce standards for schema evolution, lineage tracking, and data governance.
- Automate platform operations using CI/CD pipelines, metadata-driven configurations, and infrastructure-as-code.
Requirements
- 5-7 years of experience in data platform, infrastructure, or backend systems engineering roles.
- Strong systems engineering fundamentals — distributed systems, fault tolerance, performance tuning.
- Expertise in Apache Spark with production experience on AWS EMR.
- Experience building or managing CDC pipelines using tools like Debezium, Kafka Connect, or custom connectors.
- Familiarity with streaming systems such as Kafka, Kinesis, or Flink.
- Experience serving data using Trino or Presto in a multi-tenant environment.
- Proficiency in Python, Java or Scala, and Unix/Linux systems.
- Strong understanding of open table formats like Delta Lake or Apache Iceberg.
- Exposure to infrastructure as code (Terraform, CloudFormation) and CI/CD automation.
Tech Stack
Apache KafkaApache SparkAWSJavaPythonScalaTerraform
Categories
AI & MLBackendData EngineeringDevOps