GrepJob
Epsilon Labs, Inc.

Research Engineer - ML Infrastructure

Epsilon Labs, Inc.
Apply
6 months ago

Responsibilities

  • Build and optimize distributed ML infrastructure for training foundation models on large-scale medical imaging datasets.
  • Design and implement robust data pipelines to collect, process, and store large-scale multimodal medical imaging data.
  • Build centralized data storage solutions with standardized formats for efficient retrieval and training.
  • Create model inference pipelines and evaluation frameworks for research and production deployment.
  • Collaborate with researchers to prototype new ideas and translate them into production-ready code.
  • Own end-to-end delivery of ML systems from experimentation through deployment and monitoring.

Requirements

  • 5+ years building ML infrastructure, data pipelines, or ML systems in production.
  • Strong Python skills and expertise in PyTorch or JAX.
  • Hands-on experience with data pipeline technologies like Spark, Airflow, and BigQuery.
  • Experience with distributed systems, cloud infrastructure (AWS/GCP), and containerization (Docker/Kubernetes).
  • Track record of building scalable data systems and shipping production ML infrastructure.
  • Ability to move quickly and handle competing priorities in a fast-paced environment.

Tech Stack

Apache AirflowApache SparkAWSDatabricksDockerGoogle BigQueryGoogle Cloud PlatformKubernetesPythonPyTorchSnowflake

Categories

AI & MLData EngineeringDevOps