Senior Site Reliability Engineer, Production Engineering
Anduril Industries
2 days ago
Seattle, WA, USA
Senior / Staff+
Base Salary
$166k - $220k/yr
Responsibilities
- Design and implement monitoring, observability, and alerting systems for the Lattice platform.
- Drive incident response and conduct blameless postmortems to improve production reliability.
- Build and maintain infrastructure automation using tools like Terraform and Kubernetes.
- Establish and track Service Level Objectives (SLOs) and Error Budgets.
- Partner with software engineering teams to enhance system architecture for reliability.
- Develop capacity planning models and performance testing frameworks.
- Create runbooks and documentation for effective operation of production systems.
- Lead efforts to improve deployment safety through automated testing and rollback capabilities.
- Implement security best practices for production environments handling sensitive data.
- Build tooling to reduce operational toil and improve efficiency.
- Participate in on-call rotations for critical production incidents.
Requirements
- 7+ years of engineering experience with at least 3+ years in SRE or production operations.
- Bachelor's degree in Computer Science, Engineering, or equivalent experience.
- Deep expertise with Kubernetes in production environments.
- Strong programming skills in languages such as Go, Python, Rust, or Java.
- Experience designing and implementing observability stacks using tools like Prometheus and Grafana.
- Hands-on experience with cloud platforms like AWS, Azure, or GCP.
- Ability to debug complex distributed systems issues.
- Track record of improving system reliability through architectural changes.
- Strong incident management and communication skills.
- Must be a U.S. Person due to access to export controlled information.
- Eligible for an active U.S. Secret security clearance.
Benefits
- Comprehensive medical, dental, and vision plans at little to no cost for US roles.
- Life and disability insurance coverage for all employees.
- Highly competitive PTO plans with a holiday hiatus in December.
- Coverage for fertility treatments, adoption, and gestational carriers.
- Access to free mental health resources 24/7.
- Annual reimbursement for professional development.
- Company-funded commuter benefits based on your region.
- Relocation assistance available depending on role eligibility.
- Traditional 401(k), Roth, and after-tax options for US roles.
- Pension plan with employer match for UK and IE roles.
Tech Stack
Apache CassandraAWSAzureGoGoogle Cloud PlatformGrafanaIstioJavaJenkinsKubernetesPostgreSQLPrometheusPythonRustSpinnakerTerraformVault
Categories
DevOpsSecurity