GrepJob
Cantina

Machine Learning Engineer (Singapore)

Cantina
Apply
about 2 months ago
Singapore, SingaporeMid Level / Senior
H1B Sponsor

Responsibilities

  • Design and scale distributed data pipelines for preprocessing and dataset generation.
  • Own workflow orchestration, job scheduling, monitoring, and failure recovery for data processing jobs.
  • Implement and maintain containerized pipeline infrastructure using Kubernetes.
  • Optimize cloud-based data storage and movement for cost and operational efficiency.
  • Define and implement best practices for dataset storage layout and versioning.
  • Design curation pipelines for selecting and filtering video and image content.
  • Build and improve VLM-based captioning and metadata generation workflows.
  • Develop quality and aesthetic scoring models for data selection.
  • Build tooling to support deduplication workflows at scale.
  • Analyze dataset composition and iterate on curation logic.

Requirements

  • Strong hands-on experience with large-scale data systems and pipelines for machine learning.
  • Experience with distributed data processing frameworks like PySpark or Ray.
  • Familiarity with containerization and orchestration tools such as Docker and Kubernetes.
  • Experience with cloud-based data storage and compute (AWS, GCS, Azure).
  • Experience with VLM-based captioning pipelines and quality scoring models.
  • Familiarity with CLIP-based filtering and semantic data selection techniques.
  • Familiarity with video processing tools like FFmpeg and OpenCV.
  • Proficiency in Python.
  • Strong problem-solving, communication, and documentation skills.

Benefits

  • Competitive salary and generous company equity.
  • Personal time off and paid holidays.
  • Health insurance.
  • Global travel insurance for international travel.
  • Monthly spending stipend of $500.
  • All necessary equipment for your home office.

Tech Stack

Apache AirflowAWSAzureDockerKubernetesOpenCVPython

Categories

AI & MLData EngineeringData Science