GrepJob
Pragmatike

AI Infrastructure Engineer (GPU) - Remote EMEA

Pragmatike
Apply
17 days ago
Prague, Czechia +8 moreMid Level / Senior

Responsibilities

  • Build and operate production-grade model serving infrastructure using frameworks like vLLM, TGI, or Triton.
  • Design and implement robust deployment pipelines with blue/green and canary rollout strategies.
  • Develop and maintain auto-scaling systems and intelligent request routing layers.
  • Optimize GPU utilization, memory efficiency, and network throughput.
  • Design observability systems for tracking inference metrics and system health.
  • Manage model registries and CI/CD pipelines for automated deployments.
  • Own the full lifecycle of ML systems, including operational support.
  • Define engineering best practices and contribute to platform scalability.

Requirements

  • 4+ years of experience in ML Ops, Platform Engineering, or similar roles focused on ML systems.
  • Hands-on experience with model serving frameworks like vLLM, TGI, or Triton.
  • Strong background in container orchestration and GPU-based workloads.
  • Experience with MLOps tooling including model registries and automated deployment pipelines.
  • Proficiency in Python and infrastructure-as-code tools like Terraform or Helm.
  • Strong understanding of distributed systems and production reliability engineering.
  • Ability to effectively use AI coding assistants for development and debugging.
  • Ownership mindset with the ability to operate independently in a remote environment.

Benefits

  • Take ownership of critical infrastructure for a rapidly scaling AI-native cloud platform.
  • Build foundational ML inference systems from the ground up.
  • Work at the intersection of distributed systems, GPU computing, and sustainable cloud architecture.
  • Gain deep expertise in next-generation AI infrastructure and large-scale model serving systems.
  • Influence core engineering decisions and define scalable best practices.

Tech Stack

HelmMLflowPythonTerraform

Categories

AI & MLData EngineeringDevOps