10 months ago
Base Salary
$180k - $350k/yr
Responsibilities
- Build Kubernetes orchestration on a $20m GPU cluster.
- Scale AWS batch job system for map-reduce jobs over tens of thousands of machines.
- Design GPU scheduling software for optimal cluster utilization.
- Implement observability in production systems.
Requirements
- Experience designing and operating large-scale infrastructure such as GPU clusters or Kubernetes clusters.
- Strong focus on reliability, observability, and optimization across the stack.
Benefits
- In-person opportunity in San Francisco.
- Open to sponsoring international candidates (e.g., STEM OPT, OPT, H1B, O1, E3).
