about 4 hours ago
Base Salary
$253k - $355k/yr
Responsibilities
- Lead the design, implementation, and maintenance of a GPU-based model serving system.
- Develop ML and Generative AI systems in cloud environments on Kubernetes.
- Rapidly prototype and create a high-performance feature hydration and processing system.
- Establish a unified GPU model export framework for optimized inference models.
- Implement real-time ML observability to track feature and model performance.
- Work with LLM serving online at scale.
- Build an end-to-end inference performance benchmarking framework.
- Understand multi-cluster compute environments and network topology for ML inference.
Requirements
- 7+ years of experience in ML Engineering, AI Platform Engineering, or Cloud AI Deployment roles.
- Experience operating orchestration systems like Kubernetes at scale.
- Deep knowledge of cloud technologies for ML platforms, including AWS and Google Cloud.
- Proficiency in programming languages and frameworks such as Go and Python.
- Excellent communication skills for articulating technical concepts to non-technical stakeholders.
- Strong focus on scalability, reliability, performance, and user experience.
- Knowledge of model serving, inference pipelines, and observability for AI systems is a plus.
- Strong proficiency in Python and experience with AI/ML frameworks like Triton and Pytorch.
Benefits
- Comprehensive Healthcare Benefits and Income Replacement Programs.
- 401k with Employer Match.
- Global Benefit programs that fit your lifestyle.
- Family Planning Support.
- Gender-Affirming Care.
- Mental Health & Coaching Benefits.
- Flexible Vacation & Paid Volunteer Time Off.
- Generous Paid Parental Leave.
