about 3 hours ago
Responsibilities
- Design and build systems that improve the efficiency of ML training and inference workloads.
- Develop tooling that helps ML engineers debug, profile, optimize, and monitor model performance.
- Improve GPU and general resource utilization through scheduling, resource management, caching, and workload optimization.
- Partner with ML researchers and product teams to identify bottlenecks and drive performance improvements.
- Build benchmarking frameworks and performance dashboards for training and serving systems.
- Optimize distributed training infrastructure, data pipelines, and model serving architectures.
- Lead cross-functional initiatives that improve the productivity of Reddit ML engineers.
- Drive technical strategy for ML platform scalability, reliability, and cost efficiency.
Requirements
- BS, MS, or PhD in Computer Science or a related field.
- 5+ years of software engineering experience.
- Strong proficiency in Python.
- Proficiency in at least one systems language (Go, C++, Rust, or Java) preferred.
- Experience building distributed systems at scale.
- Experience with machine learning infrastructure, training systems, or model serving platforms.
- Deep understanding of performance engineering and systems optimization.
- Strong debugging and profiling skills.
Benefits
- Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support.
- Family Planning Support.
- Gender-Affirming Care.
- Mental Health & Coaching Benefits.
- Group Personal Pension Scheme with Employer match.
- Private Medical and Dental Scheme.
- Income Replacement Programs.
- Bike to Work scheme.
- Flexible Vacation & Paid Volunteer Time Off.
- Generous Paid Parental Leave.
