
Senior Systems Engineer - AI Infrastructure
Clockwork.io21 days ago
Base Salary
$150k - $230k/yr
Responsibilities
- Design and implement low-level systems software for GPU clusters.
- Modify and extend frameworks like PyTorch, NCCL, and CUDA runtime.
- Build components to enhance reliability and efficiency of large-scale GPU training.
- Debug complex distributed and concurrent systems.
- Own systems end-to-end from design through production.
Requirements
- 8+ years of experience building systems software.
- Experience designing and building complex systems, not just deploying them.
- Strong C/C++ skills in systems contexts.
- Deep understanding of concurrency, memory models, and failure modes.
- Experience reasoning about distributed system behavior.
Benefits
- Challenging projects.
- A friendly and inclusive workplace culture.
- Competitive compensation.
- A great benefits package.
- Catered lunch.