Principal Engineer, Compute Fleet Management
Databricks
3 months ago
Bellevue, WA, USA
Staff+
H1B Sponsor
Base Salary
$264k - $322k/yr
Responsibilities
- Provision and pool billions of cloud resources for peak performance and efficiency.
- Build architecture for horizontal scaling and resilience against failures.
- Lead development of systems to manage the compute platform.
- Achieve and maintain 99.99% availability for workloads.
- Drive utilization metrics to 60% or higher while managing cloud failures.
- Architect and enforce security and performance isolation for customer workloads.
Requirements
- Proven experience in building and operating large-scale infrastructure systems.
- Track record of leading complex, cross-team engineering initiatives.
- Hands-on experience with high-scale distributed systems on major public clouds.
- Ability to drive consensus and lead technical efforts across organizations.
- Exceptional planning and project management skills.
Tech Stack
Apache SparkAWSAzureGoogle Cloud PlatformMLflow
Categories
AI & MLData EngineeringDevOps