
Senior Backend Engineer, Inference Platform
Together AI4 days ago
Base Salary
$160k - $250k/yr
Responsibilities
- Build and optimize global and local request routing for low-latency load balancing.
- Develop auto-scaling systems to dynamically allocate resources across data centers.
- Design systems for multi-tenant traffic shaping and resource allocation.
- Engineer trade-offs between latency and throughput for diverse workloads.
- Optimize prefix caching to enhance model compute efficiency.
- Collaborate with ML researchers to scale new model architectures.
- Continuously profile and analyze system performance to identify bottlenecks.
Requirements
- 5+ years of experience in building large-scale, fault-tolerant distributed systems.
- Strong background in designing and improving complex systems for efficiency and scalability.
- Excellent understanding of low-level OS concepts like multi-threading and memory management.
- Expert-level programming skills in Rust, Go, Python, or TypeScript.
- Knowledge of modern LLMs and generative models is a plus.
- Experience with the open source ecosystem around inference is highly valuable.
- Familiarity with Kubernetes or container orchestration is a strong plus.
- Knowledge of GPU software stacks and HPC technologies is a plus.
- Bachelor’s or Master’s degree in Computer Science or related field, or equivalent experience.
Benefits
- Competitive compensation and equity.
- Health insurance and other competitive benefits.