about 2 hours ago
Base Salary
$267k - $287k/yr
Responsibilities
- Design and implement production-grade AI platform architectures on Kubernetes and public cloud infrastructure.
- Partner with customer platform, infrastructure, and ML engineering teams to deploy and optimize distributed AI workloads.
- Lead implementation engagements including platform installation, networking, security, and operational readiness.
- Troubleshoot complex distributed systems issues across infrastructure, Kubernetes, networking, and AI applications.
- Develop automation, tooling, and infrastructure-as-code to accelerate customer success.
- Build trusted relationships with technical leaders and translate business objectives into technical solutions.
- Collaborate with Product and Engineering to communicate customer requirements and shape platform capabilities.
- Share best practices through technical documentation, architecture guidance, and workshops.
Requirements
- 5+ years of experience in cloud infrastructure, platform engineering, DevOps, or software engineering.
- Experience building or operating ML/AI platforms for model training and large-scale data processing.
- Strong expertise with Kubernetes and containerized production environments.
- Experience with AWS, Azure, or GCP, including networking, security, and infrastructure automation.
- Familiarity with Infrastructure as Code and modern DevOps tools like Terraform and CI/CD pipelines.
- Strong software engineering skills in Python, Go, Java, or similar languages.
- Experience in customer-facing engineering roles such as consulting or solutions architecture.
- Excellent communication skills for engaging with both technical and executive stakeholders.
- Familiarity with distributed computing frameworks like Ray or Spark is a plus.
- Willingness to travel as needed to work with strategic customers.
