Staff Software Engineer - AI Research Infrastructure
Databricksabout 3 hours ago
Base Salary
$199k - $270k/yr
Responsibilities
- Design and implement infrastructure for large-scale experiments and model training.
- Build abstractions for job submission, scheduling, and monitoring to expedite research processes.
- Create tooling to enhance research developer productivity, including experiment management systems.
- Influence the long-term roadmap for research computation.
- Mentor and support other engineers in compute, infrastructure, and AI systems.
Requirements
- BS/MS or PhD in Computer Science or related field.
- 5+ years of software engineering experience with large-scale distributed systems.
- Deep experience in building and operating distributed systems or large-scale backend services.
- Proficiency in systems programming languages such as C++, Rust, Go, Java, or Scala.
- Experience with cluster schedulers or job orchestration systems like Kubernetes or Slurm.
- Understanding of modern ML training and inference workflows.
- Ability to communicate effectively with researchers and engineers.