22 days ago
San Francisco, CA, USA or Sunnyvale, CA, USASenior / Staff+
Base Salary
$253k - $288k/yr
Responsibilities
- Build the Virtual Pool Service as the single source of truth for GPU node states.
- Design and implement Capacity Management Intelligence to automate allocation and forecasting.
- Collaborate across teams to architect infrastructure management systems.
- Champion reliability, scalability, and security in cloud architectures.
- Streamline cloud deployment and operations using Go and other technologies.
- Set technical direction for infrastructure intelligence systems and roadmap planning.
- Represent the Platform Engineering team in technical forums.
- Lead complex architectural decisions and drive alignment across teams.
- Mentor and close gaps in organizational engineering capability.
Requirements
- Bachelor's degree in Computer Science or Software Engineering.
- 12+ years of experience building and operating distributed systems at scale.
- Proven experience with reliable, scalable, and secure cloud platforms.
- Strong understanding of distributed systems and their failure modes.
- Fluency in Go, Rust, Java, or C++, with a preference for Go.
- Ability to define and drive multi-year technical strategies.
- Track record of owning high-stakes technical problems.
- Experience influencing engineering culture and standards.
- Excellent communication and troubleshooting skills.
Benefits
- Industry competitive pay.
- Restricted Stock Units in a fast-growing technology company.
- Health insurance options including HDHP and PPO.
- Employer contributions to HSA accounts.
- Paid Parental Leave.
- Paid life insurance and disability coverage.
- 401(k) with a 100% match up to 4% of salary.
- Generous paid time off and holiday schedule.
- Cell phone reimbursement and tuition reimbursement.
- Company paid commuter benefit of $300 per month.
