
Principal MLOps Engineer
Raft Company Website14 days ago
Remote, United States +3 moreStaff+
Base Salary
$150k - $200k/yr
Responsibilities
- Design, build, and maintain secure, scalable MLOps infrastructure and deployment pipelines for production ML systems.
- Mature Raft’s internal ML platform and model lifecycle capabilities, including model packaging and monitoring.
- Deploy and manage machine learning workloads on Kubernetes, including GPU-enabled clusters.
- Support model serving and inference infrastructure for various ML use cases.
- Build and maintain CI/CD workflows for ML services and platform components.
- Collaborate with ML engineers and product teams to transition models from experimentation to deployment.
- Enhance observability, reliability, security, and maintainability across ML infrastructure.
- Evaluate and standardize runtime patterns and deployment architectures for production ML workloads.
- Contribute to infrastructure decisions across edge, on-prem, and cloud-hosted environments.
- Support compliance-driven deployment practices in defense environments.
- Engage with customers in the Department of War.
Requirements
- 7+ years of relevant experience in software engineering, platform engineering, DevOps, or MLOps.
- 5+ years of experience with Docker and Kubernetes in production environments.
- 5+ years of experience with enterprise cloud infrastructure in AWS, Azure, or similar.
- Strong experience provisioning and troubleshooting Kubernetes clusters in production.
- Experience building and maintaining machine learning platforms or pipelines.
- Practical experience deploying machine learning workloads on Kubernetes.
- Experience managing GPU-enabled clusters or workloads.
- Strong understanding of Helm and Kubernetes deployment patterns.
- Strong scripting or programming skills, preferably in Python.
- Experience with modern software engineering practices including Git and CI/CD.
- Strong troubleshooting, systems thinking, and communication skills.
- Ability to work independently and collaboratively in a fast-paced environment.
- Ability to obtain and maintain a Top Secret clearance.
- Ability to obtain Security+ certification within the first 90 days.
Benefits
- Highly competitive salary.
- Fully covered healthcare, dental, and vision coverage.
- 401(k) with company match.
- Take as you need PTO + 11 paid holidays.
- Education and training benefits.
- Annual budget for tech/gadgets needs.
- Monthly box of snacks.
- Remote, hybrid, and flexible work options.
- Team off-site events in fun locations.
- Generous referral bonuses.