AI Model Serving Specialist

Rackspace

about 15 hours ago

Remote, India

Mid Level / Senior

H1B Sponsor

Responsibilities

Package and deploy ML/LLM models on Triton, vLLM, or KServe within Kubernetes clusters.
Tune performance for latency and throughput SLAs.
Ensure GPU resource allocation and multi-tenancy with VMware VCF9, NSX-T, and vSAN ESA.
Implement RBAC, encryption, and compliance controls for private cloud customers.
Integrate models with Rackspace’s Unified Inference API for multi-tenant routing.
Configure telemetry for GPU utilization and error monitoring.
Assist solution architects in onboarding customers and creating reference patterns.
Stay current with emerging model-serving frameworks and contribute to automation scripts.

DockerGrafanaKubernetesPrometheusPython