Lead Engineer, Inference Platform

MongoDB

5 months ago

Palo Alto, CA, USA

Senior / Staff+

H1B Sponsor

Base Salary

$137k - $270k/yr

Responsibilities

Partner with AI engineers to productionize embedding models and rerankers for batch and real-time inference.
Lead projects focused on performance optimization, GPU utilization, autoscaling, and observability.
Design and build components of a multi-tenant inference service integrated with Atlas Vector Search.
Contribute to platform features like model versioning, safe deployment pipelines, and model health monitoring.
Collaborate with cross-functional teams to define architectural patterns supporting high availability and low latency.
Guide decisions on model serving architecture using tools like vLLM and Kubernetes.
Provide technical leadership and mentorship to junior engineers.

8+ years of engineering experience in backend systems, ML infrastructure, or scalable platform development.
Expertise in serving embedding models in production environments.
Strong systems skills in languages like Go, Rust, C++, or Python.
Experience with cloud-native distributed systems focusing on latency and availability.
Familiarity with inference runtimes and vector search systems.
Proven ability to collaborate across disciplines and experience levels.
Experience with high-scale SaaS infrastructure in multi-tenant environments.
1+ years of experience as a technical lead for a large-scale ML inference or training platform.

AWSAzureC++GoGoogle CloudKubernetesMongoDBPythonRust