GrepJob
Navan

Senior Site Reliability Engineer

Navan
Apply
3 days ago
Dallas, TX, USA or Austin, TX, USASenior / Staff+

Responsibilities

  • Build a fast-moving, high-growth service for enterprise travel and expense.
  • Design, implement, and operate cloud infrastructure with a focus on infrastructure as code.
  • Identify reliability anti-patterns and improve system visibility and reliability.
  • Automate processes to reduce toil and empower users.
  • Leverage AI tools to achieve autonomous operations and improve observability.
  • Define and drive the adoption of system reliability standards across engineering teams.
  • Drive the adoption of AI-assisted developer tools to enhance productivity.

Requirements

  • 5+ years of experience as a Senior SRE or DevOps Lead.
  • 2+ years in a production, 24x7 product environment.
  • Strong problem-solving skills and eagerness to learn new technologies.
  • Excellent communication skills for stakeholder collaboration.
  • Experience mentoring junior engineers and leading infrastructure projects.
  • Hands-on operational experience with Java applications and performance tuning.
  • Experience with distributed systems in a public cloud environment, preferably AWS.
  • Proficiency in microservice architecture and reliability patterns.
  • Experience with Infrastructure as Code using Terraform or similar tools.
  • Strong scripting skills in languages like Python or Bash.
  • Experience with monitoring systems such as NewRelic or DataDog.
  • Hands-on experience with deploying and monitoring AI/ML microservices.
  • Ability to integrate AI-specific telemetry for predictive insights.

Tech Stack

AWSBashDatadogGoGroovyJavaJenkinsKibanaMavenNode.jsPythonTerraform

Categories