GrepJob
Reflection

Member of Technical Staff - Web Crawl Engineer

Reflection
Apply
about 4 hours ago
San Francisco, CA, USAMid Level / Senior
H1B Sponsor

Responsibilities

  • Build and operate web-scale crawling infrastructure for data collection across billions of URLs.
  • Design and optimize URL discovery, prioritization, scheduling, and crawl orchestration systems.
  • Develop distributed crawlers that respect site constraints while acquiring content efficiently.
  • Build systems for content extraction, rendering, parsing, and normalization across diverse web formats.
  • Improve crawl coverage, freshness, efficiency, and quality through measurement and experimentation.
  • Design infrastructure for large-scale recrawling, change detection, and incremental updates.
  • Analyze crawl performance and web coverage to identify gaps and opportunities for improvement.
  • Build observability, monitoring, and reliability systems for large-scale crawl operations.
  • Debug production issues and enhance the performance and resilience of crawling infrastructure.

Requirements

  • Experience building large-scale web crawling or internet-scale data collection systems.
  • Strong understanding of crawling architectures and distributed crawl coordination.
  • Experience with large-scale distributed systems using technologies like Ray, Spark, or similar frameworks.
  • Familiarity with content extraction, HTML parsing, and modern web technologies.
  • Experience operating systems that process petabyte-scale datasets.
  • Strong systems engineering skills, including reliability and performance optimization.
  • Experience designing experiments to improve crawl quality and efficiency.
  • Excellent communication skills and ability to reason about system tradeoffs.

Benefits

  • Top-tier compensation with salary and equity structured to retain talent.
  • Comprehensive medical, dental, vision, life, and disability insurance.
  • Fully paid parental leave for all new parents and financial support for family planning.
  • Paid time off, relocation support, and additional perks for work-life balance.
  • Daily lunch and dinner provided, along with regular off-sites and team celebrations.

Tech Stack

Apache BeamApache FlinkApache SparkHTML

Categories

AI & MLData Engineering