
Research Engineer - ML Infrastructure
Epsilon Labs, Inc.6 months ago
Responsibilities
- Build and optimize distributed ML infrastructure for training foundation models on large-scale medical imaging datasets.
- Design and implement robust data pipelines to collect, process, and store large-scale multimodal medical imaging data.
- Build centralized data storage solutions with standardized formats for efficient retrieval and training.
- Create model inference pipelines and evaluation frameworks for research and production deployment.
- Collaborate with researchers to prototype new ideas and translate them into production-ready code.
- Own end-to-end delivery of ML systems from experimentation through deployment and monitoring.
Requirements
- 5+ years building ML infrastructure, data pipelines, or ML systems in production.
- Strong Python skills and expertise in PyTorch or JAX.
- Hands-on experience with data pipeline technologies like Spark, Airflow, and BigQuery.
- Experience with distributed systems, cloud infrastructure (AWS/GCP), and containerization (Docker/Kubernetes).
- Track record of building scalable data systems and shipping production ML infrastructure.
- Ability to move quickly and handle competing priorities in a fast-paced environment.