GrepJob
Pragmatike

ML Ops Engineer (EMEA Remote)

Pragmatike
Apply
22 days ago
Prague, Czechia +7 moreMid Level / Senior

Responsibilities

  • Build and operate production-grade model serving infrastructure using frameworks like vLLM, TGI, or Triton.
  • Design and implement robust deployment pipelines with blue/green and canary rollout strategies for ML models.
  • Develop and maintain auto-scaling systems and intelligent request routing layers.
  • Optimize GPU utilization, memory efficiency, and network throughput.
  • Design observability systems for tracking inference metrics and system health.
  • Manage model registries and CI/CD pipelines for automated model deployments.
  • Own the full lifecycle of ML systems, including operational support.
  • Define engineering best practices in a fast-moving startup environment.

Requirements

  • 4+ years of experience in ML Ops, Platform Engineering, or similar roles focused on ML systems.
  • Hands-on experience with model serving frameworks like vLLM, TGI, or Triton.
  • Strong background in container orchestration and operating GPU-based workloads.
  • Experience with MLOps tooling including model registries and automated deployment pipelines.
  • Proficiency in Python and infrastructure-as-code tools like Terraform or Helm.
  • Strong understanding of distributed systems and production reliability engineering.
  • Ability to effectively use AI coding assistants for development and debugging.
  • Ownership mindset with the ability to operate independently in a remote-first environment.

Benefits

  • Take ownership of critical infrastructure for a rapidly scaling AI-native cloud platform.
  • Build foundational ML inference systems from the ground up in a high-growth startup.
  • Work at the intersection of distributed systems, GPU computing, and sustainable cloud architecture.
  • Gain deep expertise in next-generation AI infrastructure and large-scale model serving systems.
  • Influence core engineering decisions and define best practices for scalability.

Tech Stack

HelmMLflowPythonTerraform

Categories