Senior Software Engineer – Serving (AI Platform)
Datadog
8 months ago
Paris, France
Senior / Mid Level
H1B Sponsor
Responsibilities
- Architect and build systems for serving ML and LLM models across all data centers with strong SLAs, observability, and reliability.
- Design and optimize Ray-based inference infrastructure to handle both low- and high-throughput workloads.
- Enable applied scientists to deploy and test models via self-service tools, CI/CD pipelines, and rollback mechanisms.
- Implement A/B testing and shadow deployment capabilities to evaluate new model versions in production.
- Collaborate with platform teams to improve GPU provisioning, traffic routing, and runtime performance.
- Instrument inference workflows with rich telemetry to drive performance and safety analysis.
Requirements
- 6+ years of backend or infrastructure engineering experience, including 2+ years working on ML/AI platforms.
- Experience building distributed systems, ideally in model serving, real-time inference, or large-scale APIs.
- Proficient in Python, Go, or another systems language with an understanding of performance tuning in high-throughput environments.
- Familiar with Ray or other inference-serving frameworks like TorchServe, Triton, or BentoML.
- Experience with GPUs and knowledge of building infrastructure that supports heterogeneous compute workloads.
- Bonus points for experience with AI observability, rollback strategies, or in-house deployment proxies.
Benefits
- New hire stock equity (RSUs) and employee stock purchase plan (ESPP).
- Continuous professional development, product training, and career pathing.
- Intradepartmental mentor and buddy program for in-house networking.
- An inclusive company culture with the ability to join Community Guilds.
- Access to Inclusion Talks, internal panel discussions.
- Free, global mental health benefits for employees and dependents age 6+.
- Competitive global benefits.
Tech Stack
GoPython
Categories
AI & MLBackend