GrepJob
Cohere

Senior Member of Technical Staff, Web Data

Cohere
Apply
about 2 hours ago
Toronto, Canada +3 moreSenior / Staff+
H1B Sponsor

Responsibilities

  • Maintain large-scale pipelines for processing web corpora.
  • Work on filtering and quality-scoring systems to identify high-value web documents.
  • Analyze web data composition across domains, languages, and time periods.
  • Develop and maintain highly-performant deduplication pipelines.
  • Collaborate with cross-functional teams to ensure data pipelines meet model demands.

Requirements

  • Strong software engineering skills with proficiency in Python.
  • Experience building data pipelines.
  • Familiarity with data processing frameworks like Apache Spark or Pandas.
  • Experience working with large-scale web datasets.
  • Knowledge of data quality assessment techniques.

Benefits

  • An open and inclusive culture and work environment.
  • Weekly lunch stipend, in-office lunches, and snacks.
  • Full health and dental benefits, including mental health support.
  • 100% Parental Leave top-up for up to 6 months.
  • Personal enrichment benefits for arts, culture, fitness, and workspace improvement.
  • Remote-flexible work options with offices in major cities.
  • 6 weeks of vacation (30 working days).

Tech Stack

Apache BeamApache SparkPandasPython

Categories

AI & MLData Engineering