Senior Backend Engineer, Inference Platform

about 2 months ago

H1B Sponsor

Base Salary

$160k - $250k/yr

Responsibilities

Build and optimize global and local request routing for low-latency load balancing.
Develop auto-scaling systems to dynamically allocate resources across data centers.
Design systems for multi-tenant traffic shaping and resource allocation.
Engineer trade-offs between latency and throughput for diverse workloads.
Optimize prefix caching to enhance model compute efficiency.
Collaborate with ML researchers to scale new model architectures.
Continuously profile and analyze system performance to identify bottlenecks.

5+ years of experience in building large-scale, fault-tolerant distributed systems.
Strong background in designing and improving complex systems for efficiency and scalability.
Excellent understanding of low-level OS concepts like multi-threading and memory management.
Expert-level programming skills in Rust, Go, Python, or TypeScript.
Knowledge of modern LLMs and generative models is a plus.
Experience with the open source ecosystem around inference is highly valuable.
Familiarity with Kubernetes or container orchestration is a strong plus.
Knowledge of GPU software stacks and HPC technologies is a plus.
Bachelor’s or Master’s degree in Computer Science or related field, or equivalent experience.