Staff Software Engineer – Data Platform Engineer

about 2 months ago

Bengaluru, IndiaStaff+

H1B Sponsor

Responsibilities

Design and implement foundational frameworks for ingestion, orchestration, schema validation, and metadata management.
Build robust, scalable pipelines for Change Data Capture (CDC) using Debezium integrated with Kafka and Spark.
Develop internal tooling to standardize and accelerate data ingestion, transformation, and publishing workflows.
Optimize data serving layers powered by Trino, including metadata syncing, security filtering, and performance tuning.
Partner with SRE and Infra teams to build autoscaling, self-healing, and cost-optimized Spark jobs on AWS EMR.
Implement observability features — logs, metrics, alerts — for critical platform services and data pipelines.
Define and enforce standards for schema evolution, lineage tracking, and data governance.
Automate platform operations using CI/CD pipelines, metadata-driven configurations, and infrastructure-as-code.

7+–12 years of experience in data platform, infrastructure, or backend systems engineering roles.
Strong systems engineering fundamentals — distributed systems, fault tolerance, performance tuning.
Expertise in Apache Spark with production experience on AWS EMR.
Experience building or managing CDC pipelines using tools like Debezium, Kafka Connect, or custom connectors.
Familiarity with streaming systems such as Kafka, Kinesis, or Flink.
Experience serving data using Trino or Presto in a multi-tenant environment.
Proficiency in Python, Java or Scala, and Unix/Linux systems.
Strong understanding of open table formats like Delta Lake or Apache Iceberg.
Exposure to infrastructure as code (Terraform, CloudFormation) and CI/CD automation.

Apache KafkaApache SparkAWSJava PythonScalaTerraform