Research Engineer - ML Infrastructure

8 months ago

Responsibilities

Build and optimize distributed ML infrastructure for training foundation models on large-scale medical imaging datasets.
Design and implement robust data pipelines to collect, process, and store large-scale multimodal medical imaging data.
Build centralized data storage solutions with standardized formats for efficient retrieval and training.
Create model inference pipelines and evaluation frameworks for research and production deployment.
Collaborate with researchers to prototype new ideas and translate them into production-ready code.
Own end-to-end delivery of ML systems from experimentation through deployment and monitoring.

5+ years building ML infrastructure, data pipelines, or ML systems in production.
Strong Python skills and expertise in PyTorch or JAX.
Hands-on experience with data pipeline technologies like Spark, Airflow, and BigQuery.
Experience with distributed systems, cloud infrastructure (AWS/GCP), and containerization (Docker/Kubernetes).
Track record of building scalable data systems and shipping production ML infrastructure.
Ability to move quickly and handle competing priorities in a fast-paced environment.