AI Engineer, Data Pipeline - 11318

about 19 hours ago

Bengaluru, India

Mid Level / Senior

H1B Sponsor

Responsibilities

Build data ingestion pipelines to extract and transform enterprise data.
Implement data cleansing and normalization routines.
Write and maintain ETL jobs using Spark/PySpark on cloud infrastructure.
Implement data validation and quality checks at each pipeline stage.
Build automated data export jobs for model training datasets.
Support feature extraction from enterprise schemas.
Monitor pipeline health, troubleshoot failures, and optimize performance.
Document data lineage, schemas, and transformation logic.

Requirements

3+ years of software engineering experience.
Experience with Python and data processing (pandas, PySpark, or equivalent).
Familiarity with SQL and relational databases (MySQL, PostgreSQL).
Experience with cloud data services (object storage, managed Spark, managed ETL, or equivalent).
Understanding of ETL/ELT patterns and data pipeline design.
Experience with data formats (Parquet, JSON, Avro).
Strong attention to data quality and testing.
BS in Computer Science or equivalent experience.

Tech Stack

Apache SparkMySQLPostgreSQLPythonSQL

Categories

AI & MLData EngineeringTesting