Senior Software Engineer (Agentic Search) - Crawler

about 3 hours ago

Amsterdam, Netherlands or London, United KingdomSenior

Responsibilities

Design, implement, and operate web-scale crawling systems for acquiring content from the internet.
Build ingestion workflows for internal and external data sources, including crawlers and partner integrations.
Develop crawl scheduling, prioritization, recrawl policies, and freshness strategies.
Build systems for URL discovery, deduplication, content extraction, and crawl orchestration.
Ensure reliable operation of crawling infrastructure under high-throughput conditions.
Define observability and quality metrics for crawl coverage, freshness, throughput, and content quality.
Monitor resource usage, bandwidth consumption, and infrastructure cost.
Collaborate with indexing and ML teams to ensure acquired content meets retrieval and ranking requirements.
Enable safe experimentation with crawling strategies and content acquisition policies.

5+ years of experience building backend or distributed systems.
Strong Go or C++ expertise.
Experience with large-scale distributed systems (10k+ RPS, billions of URLs, high-throughput pipelines).
Understanding of web protocols (HTTP, DNS, TLS), crawling, scraping, and content extraction.
Experience operating production systems and debugging failures in distributed environments.
Strong understanding of scalability, fault tolerance, and resource management.

Apache BeamApache FlinkApache Kafka Apache Spark C++GoRabbitMQ