Anduril Industries

Senior Site Reliability Engineer, Production Engineering

Anduril Industries

Apply
2 days ago
Seattle, WA, USA
Senior / Staff+

Base Salary

$166k - $220k/yr

Responsibilities

  • Design and implement monitoring, observability, and alerting systems for the Lattice platform.
  • Drive incident response and conduct blameless postmortems to improve production reliability.
  • Build and maintain infrastructure automation using tools like Terraform and Kubernetes.
  • Establish and track Service Level Objectives (SLOs) and Error Budgets.
  • Partner with software engineering teams to enhance system architecture for reliability.
  • Develop capacity planning models and performance testing frameworks.
  • Create runbooks and documentation for effective operation of production systems.
  • Lead efforts to improve deployment safety through automated testing and rollback capabilities.
  • Implement security best practices for production environments handling sensitive data.
  • Build tooling to reduce operational toil and improve efficiency.
  • Participate in on-call rotations for critical production incidents.

Requirements

  • 7+ years of engineering experience with at least 3+ years in SRE or production operations.
  • Bachelor's degree in Computer Science, Engineering, or equivalent experience.
  • Deep expertise with Kubernetes in production environments.
  • Strong programming skills in languages such as Go, Python, Rust, or Java.
  • Experience designing and implementing observability stacks using tools like Prometheus and Grafana.
  • Hands-on experience with cloud platforms like AWS, Azure, or GCP.
  • Ability to debug complex distributed systems issues.
  • Track record of improving system reliability through architectural changes.
  • Strong incident management and communication skills.
  • Must be a U.S. Person due to access to export controlled information.
  • Eligible for an active U.S. Secret security clearance.

Benefits

  • Comprehensive medical, dental, and vision plans at little to no cost for US roles.
  • Life and disability insurance coverage for all employees.
  • Highly competitive PTO plans with a holiday hiatus in December.
  • Coverage for fertility treatments, adoption, and gestational carriers.
  • Access to free mental health resources 24/7.
  • Annual reimbursement for professional development.
  • Company-funded commuter benefits based on your region.
  • Relocation assistance available depending on role eligibility.
  • Traditional 401(k), Roth, and after-tax options for US roles.
  • Pension plan with employer match for UK and IE roles.

Tech Stack

Apache CassandraAWSAzureGoGoogle Cloud PlatformGrafanaIstioJavaJenkinsKubernetesPostgreSQLPrometheusPythonRustSpinnakerTerraformVault

Categories

DevOpsSecurity