
ML Ops Engineer (EMEA Remote)
Pragmatike22 days ago
Prague, Czechia +7 moreMid Level / Senior
Responsibilities
- Build and operate production-grade model serving infrastructure using frameworks like vLLM, TGI, or Triton.
- Design and implement robust deployment pipelines with blue/green and canary rollout strategies for ML models.
- Develop and maintain auto-scaling systems and intelligent request routing layers.
- Optimize GPU utilization, memory efficiency, and network throughput.
- Design observability systems for tracking inference metrics and system health.
- Manage model registries and CI/CD pipelines for automated model deployments.
- Own the full lifecycle of ML systems, including operational support.
- Define engineering best practices in a fast-moving startup environment.
Requirements
- 4+ years of experience in ML Ops, Platform Engineering, or similar roles focused on ML systems.
- Hands-on experience with model serving frameworks like vLLM, TGI, or Triton.
- Strong background in container orchestration and operating GPU-based workloads.
- Experience with MLOps tooling including model registries and automated deployment pipelines.
- Proficiency in Python and infrastructure-as-code tools like Terraform or Helm.
- Strong understanding of distributed systems and production reliability engineering.
- Ability to effectively use AI coding assistants for development and debugging.
- Ownership mindset with the ability to operate independently in a remote-first environment.
Benefits
- Take ownership of critical infrastructure for a rapidly scaling AI-native cloud platform.
- Build foundational ML inference systems from the ground up in a high-growth startup.
- Work at the intersection of distributed systems, GPU computing, and sustainable cloud architecture.
- Gain deep expertise in next-generation AI infrastructure and large-scale model serving systems.
- Influence core engineering decisions and define best practices for scalability.