Software Engineer, Site Reliability

about 2 months ago

H1B Sponsor

Base Salary

$180k - $250k/yr

Responsibilities

Own and operate our Kubernetes infrastructure, including cluster lifecycle and upgrades.
Build and maintain CI/CD pipelines and deployment infrastructure.
Leverage AI to automate analysis and resolution of production issues.
Build dashboards, alerting, and anomaly detection across systems.
Define and enforce SLOs and develop incident response processes.
Manage and improve networking, load balancing, and service mesh configurations.
Drive reliability improvements through automation and chaos engineering.

5+ years of experience in managing critical production systems.
Strong production experience with Kubernetes at scale using infrastructure-as-code.
Deep knowledge of Linux networking and container networking.
Experience building CI/CD systems and GitOps workflows.
Proficiency in Python and either Go or Bash for automation.
Strong experience with logging, monitoring, and alerting tools.
Excellent communication skills and ability to drive technical decisions.
Self-starter who executes quickly and seeks constant improvement.

AnsibleBashDatadogGoGrafanaKubernetesPrometheusPython Terraform