Rackspace

AI Model Serving Specialist

Rackspace

Apply
about 15 hours ago
Remote, India
Mid Level / Senior
H1B Sponsor

Responsibilities

  • Package and deploy ML/LLM models on Triton, vLLM, or KServe within Kubernetes clusters.
  • Tune performance for latency and throughput SLAs.
  • Ensure GPU resource allocation and multi-tenancy with VMware VCF9, NSX-T, and vSAN ESA.
  • Implement RBAC, encryption, and compliance controls for private cloud customers.
  • Integrate models with Rackspace’s Unified Inference API for multi-tenant routing.
  • Configure telemetry for GPU utilization and error monitoring.
  • Assist solution architects in onboarding customers and creating reference patterns.
  • Stay current with emerging model-serving frameworks and contribute to automation scripts.

Requirements

  • Hands-on experience with NVIDIA Triton, vLLM, or similar serving stacks.
  • Strong knowledge of Kubernetes, GPU scheduling, and CUDA/MIG.
  • Familiarity with VMware VCF9, NSX-T networking, and vSAN storage classes.
  • Proficiency in Python and containerization (Docker).
  • Understanding of observability stacks and FinOps principles.
  • Exposure to RAG architectures and secure multi-tenant environments.
  • Excellent problem-solving and customer-facing communication skills.

Tech Stack

DockerGrafanaKubernetesPrometheusPython

Categories

AI & MLDevOps