GrepJob
Later

ML Infrastructure Engineer

Later
Apply
10 days ago
Boston, MA, USAMid Level / Senior

Base Salary

$145k - $165k/yr

Responsibilities

  • Define and own the long-term ML infrastructure roadmap.
  • Establish best practices for model lifecycle management and deployment.
  • Identify infrastructure gaps and design scalable solutions.
  • Design, build, and maintain production-grade model deployment systems.
  • Automate end-to-end ML lifecycle workflows.
  • Implement robust monitoring systems for model performance.
  • Operate across AWS and GCP environments for ML workloads.
  • Develop and maintain infrastructure-as-code for cloud environments.
  • Implement and optimize CI/CD workflows for ML automation.
  • Partner with cross-functional teams to support ML workflows.
  • Stay current on emerging ML Ops practices and tools.

Requirements

  • 4+ years of experience in ML Ops, ML infrastructure, or backend engineering.
  • Experience in cloud-native environments (AWS and/or GCP).
  • Proven track record in designing CI/CD pipelines for ML systems.
  • Strong experience with Amazon SageMaker, Docker, and Flask-based APIs.
  • Hands-on experience with ML lifecycle tooling like MLflow or SageMaker Studio.
  • Experience managing container orchestration platforms like Kubernetes.
  • Strong programming experience in Python; additional languages are a plus.
  • Experience with infrastructure-as-code tools like Terraform or CloudFormation.
  • Familiarity with observability tools such as CloudWatch and Prometheus.
  • Experience managing GPU-based workloads.
  • Familiarity with data infrastructure tools like BigQuery.
  • Bonus: Experience with LLMs or generative AI pipelines.

Tech Stack

AWSDatadogDockerFlaskGitHub ActionsGitLab CI/CDGoGoogle BigQueryGoogle Cloud PlatformGrafanaJavaKubernetesMLflowPrometheusPythonScalaTerraform

Categories

AI & MLData EngineeringDevOps