Principal Engineer, Compute Platform
3 days ago
Remote, United States or San Francisco, CA, USA
Staff+
H1B Sponsor
Base Salary
$243k - $500k/yr
Responsibilities
- Lead the transition from isolated compute resources to a large-scale shared compute platform.
- Collaborate with leads across platforms to develop features and migration paths.
- Drive utilization of the shared compute platform through workload optimization.
- Ensure the platform meets the diverse needs of multiple internal customers.
- Guide a team of engineers on design, execution, and performance topics.
- Develop a multi-cloud abstraction layer for workload management.
- Set high standards for production quality and engineering excellence.
- Work on capacity planning and efficiency for virtual machine resources.
- Focus on delivering GPU resources for AI workloads.
Requirements
- Bachelor’s degree in Computer Science, Engineering, or a related field.
- 12+ years of experience with large-scale, production distributed systems.
- 5+ years of experience with Kubernetes in production environments.
- Experience collaborating with Software Engineering and Site Reliability Engineering teams.
- Preferred experience in running and migrating distributed data systems to Kubernetes.
- Strong ability to work with cross-functional partners.
- Passion for automation and building effective tooling.
Tech Stack
Kubernetes
Categories
AI & MLBackendData EngineeringDevOps