10 days ago
Boston, MA, USAMid Level / Senior
Base Salary
$145k - $165k/yr
Responsibilities
- Define and own the long-term ML infrastructure roadmap.
- Establish best practices for model lifecycle management and deployment.
- Identify infrastructure gaps and design scalable solutions.
- Design, build, and maintain production-grade model deployment systems.
- Automate end-to-end ML lifecycle workflows.
- Implement robust monitoring systems for model performance.
- Operate across AWS and GCP environments for ML workloads.
- Develop and maintain infrastructure-as-code for cloud environments.
- Implement and optimize CI/CD workflows for ML automation.
- Partner with cross-functional teams to support ML workflows.
- Stay current on emerging ML Ops practices and tools.
Requirements
- 4+ years of experience in ML Ops, ML infrastructure, or backend engineering.
- Experience in cloud-native environments (AWS and/or GCP).
- Proven track record in designing CI/CD pipelines for ML systems.
- Strong experience with Amazon SageMaker, Docker, and Flask-based APIs.
- Hands-on experience with ML lifecycle tooling like MLflow or SageMaker Studio.
- Experience managing container orchestration platforms like Kubernetes.
- Strong programming experience in Python; additional languages are a plus.
- Experience with infrastructure-as-code tools like Terraform or CloudFormation.
- Familiarity with observability tools such as CloudWatch and Prometheus.
- Experience managing GPU-based workloads.
- Familiarity with data infrastructure tools like BigQuery.
- Bonus: Experience with LLMs or generative AI pipelines.