about 2 hours ago
Remote, Worldwide +2 moreMid Level / Senior
H1B Sponsor
Base Salary
$135k - $285k/yr
Responsibilities
- Own the reliability of Baseten's multi-cloud Kubernetes infrastructure.
- Build and maintain observability infrastructure as code.
- Author and improve runbooks for recurring failure patterns.
- Identify high-frequency failure patterns and create automated mitigations.
- Diagnose and resolve runtime issues related to system performance.
- Define and instrument SLOs and SLIs across services.
- Navigate ambiguity and make principled tradeoffs in system design.
Requirements
- Extensive hands-on experience with Kubernetes, preferably multi-cloud.
- Experience in building and maintaining scalable infrastructure.
- Strong foundation in observability tooling and practices.
- Experience with infrastructure-as-code and GitOps workflows.
- Experience writing runbooks and leading incident responses.
- Comfortable working at the intersection of engineering and operations.
- Familiarity with incident management platforms is a plus.
Benefits
- Competitive compensation with meaningful equity.
- 100% coverage of medical, dental, and vision insurance for employees and dependents.
- Flexible PTO policy including a company-wide Winter Break.
- Paid parental leave and fertility/family-building stipend.
- Company-facilitated 401(k) plan.
- Exposure to various ML startups for learning and networking.
