about 3 hours ago
Responsibilities
- Design platform architecture for multi-tenant inference workloads.
- Develop robust API layers and developer SDKs for distributed inference orchestration.
- Build and harden a multi-tenant control plane for accurate metering and tenant isolation.
- Optimize inference performance across the entire system stack.
- Build observability and SLOs for insights into system economics and performance.
- Partner with product and infrastructure teams on model onboarding and capacity planning.
- Promote a culture of engineering excellence within the team.
Requirements
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- 7+ years of experience building and operating backend distributed systems.
- Demonstrated cross-team technical leadership in backend distributed systems or ML infrastructure.
- Strong fundamentals in data-intensive distributed systems and performance profiling.
- Hands-on experience with large-scale inference services on GPUs.
- Direct experience with inference engines or serving frameworks.
- Strong programming skills in C++, Go, Rust, or Python.
- Familiarity with deep learning frameworks and GPU computing primitives.
- Excellent verbal and written communication skills.
- Experience with autonomous vehicles is a bonus.