Databricks

Principal Engineer, Compute Fleet Management

Databricks

Apply
3 months ago
Bellevue, WA, USA
Staff+
H1B Sponsor

Base Salary

$264k - $322k/yr

Responsibilities

  • Provision and pool billions of cloud resources for peak performance and efficiency.
  • Build architecture for horizontal scaling and resilience against failures.
  • Lead development of systems to manage the compute platform.
  • Achieve and maintain 99.99% availability for workloads.
  • Drive utilization metrics to 60% or higher while managing cloud failures.
  • Architect and enforce security and performance isolation for customer workloads.

Requirements

  • Proven experience in building and operating large-scale infrastructure systems.
  • Track record of leading complex, cross-team engineering initiatives.
  • Hands-on experience with high-scale distributed systems on major public clouds.
  • Ability to drive consensus and lead technical efforts across organizations.
  • Exceptional planning and project management skills.

Tech Stack

Apache SparkAWSAzureGoogle Cloud PlatformMLflow

Categories

AI & MLData EngineeringDevOps