9 days ago
Remote, Worldwide +2 moreMid Level / Senior
H1B Sponsor
Responsibilities
- Define specifications and audit large-scale audio/text datasets.
- Build automated quality metrics and validation dashboards.
- Train models to tag, score, and filter data effectively.
- Apply data cleaning techniques to maintain dataset integrity.
- Optimize data selection through sampling and active learning.
- Integrate quality gates into training and evaluation pipelines.
Requirements
- Strong experience in building ML-driven data quality systems for audio/speech.
- Proficient in Python and PyTorch, with experience in training SSL-ASR models.
- Familiarity with audio/speech fundamentals and relevant libraries.
- Scalable data engineering skills with tools like Spark and SQL.
- Experience with ASR/TTS metrics and dataset validation.
- Ability to translate ambiguous requirements into measurable improvements.
