about 4 hours ago
Base Salary
$180k - $360k/yr
Responsibilities
- Develop infrastructure and orchestration systems for large-scale distributed LLM inference.
- Work across the stack, from customer-facing features to low-level infrastructure components.
- Build platform capabilities related to routing, autoscaling, scheduling, observability, and runtime management.
- Improve the reliability, scalability, and usability of the inference stack.
- Collaborate with Model Performance engineers to implement inference optimizations.
- Define best practices for testing, release automation, benchmarking, and operational excellence.
- Debug complex production systems across Kubernetes, distributed runtimes, networking, and GPU workloads.
- Make engineering tradeoffs balancing performance, reliability, and developer experience.
- Own projects end-to-end from architecture to deployment and iteration based on feedback.
Requirements
- Bachelor's, Master's, or Ph.D. in Computer Science, Engineering, or a related field.
- Strong background in distributed systems, backend infrastructure, or platform engineering.
- Experience building and operating production systems with a focus on reliability and scale.
- Strong sense of developer experience and usability.
- Motivated to learn new languages, frameworks, and systems.
- Ability to debug complex systems across multiple layers of the stack.
- Genuine interest in inference engineering and willingness to learn.
- Excellent communication and collaboration skills.
Benefits
- Competitive compensation, including meaningful equity.
- 100% coverage of medical, dental, and vision insurance for employees and dependents.
- Flexible PTO policy including a company-wide Winter Break.
- Paid parental leave.
- Fertility and family-building stipend through Carrot.
- Company-facilitated 401(k).
- Exposure to a variety of ML startups for learning and networking opportunities.
