about 3 hours ago
Amsterdam, Netherlands or London, United KingdomSenior
Responsibilities
- Design, implement, and operate web-scale crawling systems for acquiring content from the internet.
- Build ingestion workflows for internal and external data sources, including crawlers and partner integrations.
- Develop crawl scheduling, prioritization, recrawl policies, and freshness strategies.
- Build systems for URL discovery, deduplication, content extraction, and crawl orchestration.
- Ensure reliable operation of crawling infrastructure under high-throughput conditions.
- Define observability and quality metrics for crawl coverage, freshness, throughput, and content quality.
- Monitor resource usage, bandwidth consumption, and infrastructure cost.
- Collaborate with indexing and ML teams to ensure acquired content meets retrieval and ranking requirements.
- Enable safe experimentation with crawling strategies and content acquisition policies.
Requirements
- 5+ years of experience building backend or distributed systems.
- Strong Go or C++ expertise.
- Experience with large-scale distributed systems (10k+ RPS, billions of URLs, high-throughput pipelines).
- Understanding of web protocols (HTTP, DNS, TLS), crawling, scraping, and content extraction.
- Experience operating production systems and debugging failures in distributed environments.
- Strong understanding of scalability, fault tolerance, and resource management.
Benefits
- Competitive compensation.
- Career growth and learning opportunities.
- Flexibility and ownership.
- Collaborative and innovative culture.
- Opportunity to work on impactful AI projects.
- International environment and talented teams.