about 2 hours ago
San Francisco, CA, USA
Mid Level / Senior
H1B Sponsor
Base Salary
$350k - $850k/yr
Responsibilities
- Serve as the dedicated reliability owner for the Knowledge Work training environments.
- Own a clean, canonical set of evaluation tools and processes for Knowledge Work capabilities.
- Build and automate observability, dashboards, and operational tooling for training environments.
- Proactively harden environments and evaluation systems through load testing and fault injection.
- Act as the primary point of contact for partner training and infrastructure teams.
- Reduce the operational burden on researchers to allow focus on research.
Requirements
- Highly experienced Python engineer who ships reliable, well-instrumented code.
- Demonstrated experience operating ML or distributed systems at scale.
- Strong SRE or production-engineering mindset focused on SLOs and load tests.
- Foundational ML knowledge to understand training environments and evaluation metrics.
- Ability to read research code and reason about evaluation integrity.
Benefits
- Competitive compensation and benefits.
- Optional equity donation matching.
- Generous vacation and parental leave.
- Flexible working hours.
- Collaborative office space.
Tech Stack
Python
Categories
AI & MLData EngineeringDevOps