Senior AI Engineer, NLP & Training Data - 11316
Coupa Softwareabout 20 hours ago
Bengaluru, India
Senior
H1B Sponsor
Responsibilities
- Design and implement training data generation pipelines, including synthetic data generation.
- Build data labeling and annotation workflows with quality validation loops.
- Convert enterprise data into formats suitable for model training.
- Implement active learning strategies to identify high-value training examples.
- Collaborate with domain experts to validate training data quality and relevance.
- Build automated data quality checks: coverage, balance, consistency.
- Design training data versioning and lineage tracking.
- Analyze model evaluation results to identify training data gaps.
Requirements
- 5+ years of software engineering experience, with 2+ years in NLP, data science, or ML data engineering.
- Experience with text processing, tokenization, and NLP pipelines.
- Hands-on experience with data labeling tools and annotation workflows.
- Experience generating synthetic training data using language model APIs.
- Understanding of instruction-tuning and training data quality metrics.
- Proficiency in Python (pandas, PySpark).
- Experience with data versioning tools is a plus.
- BS/MS in Computer Science, NLP, or equivalent experience.
Tech Stack
PandasPython
Categories
AI & MLData EngineeringData Science