8 days ago
Bucharest, RomaniaSenior / Mid Level
H1B Sponsor
Responsibilities
- Build and design large scale, distributed crawling bots and infrastructure.
- Develop and maintain data pipelines to extract data from various sources.
- Help unify heterogeneous documents into a coherent data schema.
- Preprocess and normalize raw data for classification and search indexing.
- Build APIs to expose structured, classified data via ElasticSearch/OpenSearch.
- Collaborate with ML/NLP teams to integrate classification models.
- Automate workflows using Apache Airflow and deploy solutions in Kubernetes.
- Optimize and scale data pipelines using Spark for large datasets.
Requirements
- 4+ years of experience in Python with building crawling/scraping solutions.
- Experience working with APIs (REST) and PDF processing.
- Proficiency in data processing and search technologies like ElasticSearch.
- Experience with React is preferred.
- Strong problem-solving skills for handling anti-scraping mechanisms.
- Hands-on experience with AWS or GCP.
Benefits
- Enjoy the flexibility of remote work.
- Competitive base salaries that reflect your value.
- Generous Paid Time Off for well-rested performance.
- Comprehensive health benefits including Medical, Dental, and Vision.
- 401(k) Retirement Savings Plan with employer match.
- Support for continuing education and professional development.
Tech Stack
Apache AirflowApache SparkAWSElasticsearchGoogle Cloud PlatformKubernetesPythonPyTorchReactspaCyTensorFlow
