4 months ago
Base Salary
$225k - $550k/yr
Responsibilities
- Design and scale high-performance inference serving systems.
- Optimize KV-cache management, batching strategies, and scheduling.
- Improve throughput and latency for long-context workloads.
- Build and maintain distributed RL and post-training infrastructure.
- Improve reliability of rollout, evaluation, and reward pipelines.
- Automate fault detection and recovery for serving and RL systems.
- Profile and eliminate performance bottlenecks across GPU, networking, and storage layers.
- Collaborate with Kernels and Research to align execution systems with model architecture.
Requirements
- Strong software engineering and distributed systems fundamentals.
- Experience building or operating large-scale inference or training systems.
- Deep understanding of GPU execution constraints and memory trade-offs.
- Experience debugging performance issues in production ML systems.
- Ability to reason about system-level trade-offs between latency, throughput, and cost.
- Track record of owning critical production infrastructure.
Benefits
- Annual salary range: $225K - $550K.
- Equity is a significant part of total compensation, in addition to salary.
- 401(k) plan with 6% salary matching.
- Generous health, dental and vision insurance for you and your dependents.
- Unlimited paid time off.
- Visa sponsorship and relocation stipend to bring you to SF, if possible.
