GrepJob
Navan

Senior AI Operations (AI Ops) Engineer

Navan
Apply
5 days ago
Palo Alto, CA, USASenior

Base Salary

$116k - $258k/yr

Responsibilities

  • Build and own the runtime environment for 100+ specialized AI services.
  • Design and implement SageMaker Multi-Model Endpoints and Inference Components.
  • Build deterministic shells around probabilistic LM outputs for reliability.
  • Implement automated benchmarking to detect semantic drift and hallucinations.
  • Create reusable patterns and Terraform-based infrastructure for deployment.
  • Collaborate with AI Researchers to optimize agentic autonomy.

Requirements

  • 5+ years in SRE, Platform Engineering, or MLOps, with 2 years in LLMs/SLMs production.
  • Deep expertise with AWS SageMaker, especially Multi-Model Endpoints.
  • Experience with Small Language Models and parameter-efficient fine-tuning strategies.
  • Strong proficiency in Python and Terraform.
  • Experience with Docker, Kubernetes, or AWS ECS/Fargate.
  • Familiarity with Snowflake and Vector Databases.
  • Understanding of AI at scale as a statistical challenge.
  • Experience building CI/CD pipelines for non-deterministic software.
  • BS or MS in Computer Science, Engineering, Mathematics, or related field.

Tech Stack

AWSDockerGitHub ActionsJenkinsKubernetesPythonSnowflakeTerraform

Categories

AI & MLData EngineeringDevOps