
AI Model Serving Specialist
Rackspace
about 15 hours ago
Remote, India
Mid Level / Senior
H1B Sponsor
Responsibilities
- Package and deploy ML/LLM models on Triton, vLLM, or KServe within Kubernetes clusters.
- Tune performance for latency and throughput SLAs.
- Ensure GPU resource allocation and multi-tenancy with VMware VCF9, NSX-T, and vSAN ESA.
- Implement RBAC, encryption, and compliance controls for private cloud customers.
- Integrate models with Rackspace’s Unified Inference API for multi-tenant routing.
- Configure telemetry for GPU utilization and error monitoring.
- Assist solution architects in onboarding customers and creating reference patterns.
- Stay current with emerging model-serving frameworks and contribute to automation scripts.
Requirements
- Hands-on experience with NVIDIA Triton, vLLM, or similar serving stacks.
- Strong knowledge of Kubernetes, GPU scheduling, and CUDA/MIG.
- Familiarity with VMware VCF9, NSX-T networking, and vSAN storage classes.
- Proficiency in Python and containerization (Docker).
- Understanding of observability stacks and FinOps principles.
- Exposure to RAG architectures and secure multi-tenant environments.
- Excellent problem-solving and customer-facing communication skills.
Tech Stack
DockerGrafanaKubernetesPrometheusPython
Categories
AI & MLDevOps