6 days ago
Base Salary
$314k - $465k/yr
Responsibilities
- Drive technical vision for Lambda's Managed Kubernetes platform.
- Integrate and extend NVIDIA's open-source ecosystem for AI workloads.
- Design GPU-aware orchestration systems and lead service development.
- Inform networking and storage architecture requirements for AI workloads.
- Build the foundation for Managed Slurm on Kubernetes.
- Design higher-level platform services for inference and autoscaling.
- Establish operational excellence for managed services.
- Serve as a technical bridge between Orchestration and other infrastructure teams.
- Champion consistency and standardization across Lambda's infrastructure.
- Mentor engineers and establish best practices for Kubernetes development.
Requirements
- 10+ years of experience in software engineering, platform engineering, or SRE.
- Expert-level understanding of Kubernetes internals and extension patterns.
- Holistic infrastructure expertise across compute, networking, storage, and security.
- Strong software engineering skills in Go and Python.
- Deep experience with GPU orchestration in Kubernetes.
- Proven track record of technical leadership and mentoring.
- Experience designing and operating managed services or multi-tenant platforms.
- Strong understanding of distributed systems principles.
- Experience with observability tools at scale.
- Solid knowledge of Linux systems and high-performance networking.
Benefits
- Generous cash and equity compensation.
- Health, dental, and vision coverage for you and your dependents.
- Wellness and commuter stipends for select roles.
- 401k Plan with 2% company match for USA employees.
- Flexible paid time off plan.
