GrepJob
about 2 hours ago
Remote, Worldwide +2 moreMid Level / Senior
H1B Sponsor

Base Salary

$135k - $285k/yr

Responsibilities

  • Own the reliability of Baseten's multi-cloud Kubernetes infrastructure.
  • Build and maintain observability infrastructure as code.
  • Author and improve runbooks for recurring failure patterns.
  • Identify high-frequency failure patterns and create automated mitigations.
  • Diagnose and resolve runtime issues related to system performance.
  • Define and instrument SLOs and SLIs across services.
  • Navigate ambiguity and make principled tradeoffs in system design.

Requirements

  • Extensive hands-on experience with Kubernetes, preferably multi-cloud.
  • Experience in building and maintaining scalable infrastructure.
  • Strong foundation in observability tooling and practices.
  • Experience with infrastructure-as-code and GitOps workflows.
  • Experience writing runbooks and leading incident responses.
  • Comfortable working at the intersection of engineering and operations.
  • Familiarity with incident management platforms is a plus.

Benefits

  • Competitive compensation with meaningful equity.
  • 100% coverage of medical, dental, and vision insurance for employees and dependents.
  • Flexible PTO policy including a company-wide Winter Break.
  • Paid parental leave and fertility/family-building stipend.
  • Company-facilitated 401(k) plan.
  • Exposure to various ML startups for learning and networking.

Tech Stack

GrafanaHelmKubernetesPrometheusTerraform

Categories