1 day ago
Responsibilities
- Design and operate distributed inference systems for LLMs, optimizing throughput, latency, and cost.
- Build large-scale data pipelines that ingest, transform, and curate datasets for training and evaluation.
- Debug complex production issues that arise under real traffic conditions.
- Collaborate with researchers and ML engineers to transition experimental workloads to production.
Requirements
- 5+ years of experience building and operating distributed systems in production.
- Deep experience with large-scale data or compute frameworks like Ray, Spark, or Flink.
- Strong fluency in Python and at least one systems language such as Go, Rust, or C++.
- Working knowledge of GPU/accelerator stack and CUDA fundamentals.
- Experience operating Kubernetes-based infrastructure, including custom operators or schedulers.
- Proven track record of managing production incidents from diagnosis to resolution.
Benefits
- Flexible work arrangements with in-person collaboration in the Bay Area and a global-first team.
- Annual travel stipend for exploring new countries.
- Weekly meal allowance for take-out or grocery delivery.
- Comprehensive medical benefits and generous paid time off.
