GrepJob
Arcadia

Principal Site Reliability Engineer

Arcadia
Apply
about 20 hours ago
New York, NY, USA or Remote, United States
Staff+

Base Salary

$180k - $230k/yr

Responsibilities

  • Act as the technical leader for reliability across one or more domains.
  • Drive reliability strategy by defining SLOs, SLIs, and reliability KPIs.
  • Lead incident response maturity and improve incident command practices.
  • Architect and implement automation to reduce toil and risk.
  • Advance GitOps delivery practices using Argo CD.
  • Scale infrastructure management with Crossplane and Terraform.
  • Lead operational readiness and reliability reviews for new features.
  • Improve performance and cost efficiency through capacity planning.
  • Champion infrastructure security best practices for PHI environments.
  • Mentor Staff and Senior engineers to raise reliability standards.

Requirements

  • 8+ years of experience in SRE, platform engineering, or related roles.
  • Demonstrated principal-level impact in leading cross-team initiatives.
  • Expertise in Kubernetes operations and troubleshooting.
  • Strong GitOps experience with Argo CD and Argo Workflows.
  • Experience with Crossplane and Terraform for infrastructure orchestration.
  • Deep AWS experience and understanding of reliability in cloud systems.
  • Proficiency in Python for automation and tooling.
  • Strong incident management and on-call leadership experience.
  • Excellent communication skills for translating technical risks.

Benefits

  • Be part of a mission-driven company transforming the healthcare industry.
  • Flexible, remote-friendly work environment.
  • Employee-driven programs for personal and professional development.
  • Join a diverse and purpose-driven community at Arcadia.

Tech Stack

Apache CassandraApache SparkArgo CDAWSKubernetesPythonTerraform

Categories

DevOpsSecurity