GrepJob
SecurityScorecard

Senior Site Reliability Engineer

SecurityScorecard
Apply
about 3 hours ago

Base Salary

$152k - $195k/yr

Responsibilities

  • Design, build, and scale Kubernetes infrastructure for secure, multi-tenant applications.
  • Build and operate AI tooling infrastructure, establishing secure AI access for production systems.
  • Optimize and maintain CI/CD pipelines for improved reliability and speed.
  • Implement progressive delivery strategies like blue/green and canary deployments.
  • Advance Infrastructure as Code with Terraform, Helm, and Argo CD.
  • Operate and optimize streaming and analytics infrastructure such as Kafka and Flink.
  • Integrate automated testing into the CI/CD lifecycle.
  • Define SLOs, alerts, and dashboards to enhance system observability.
  • Lead incident response and postmortems to address root causes.
  • Mentor engineers on Kubernetes, CI/CD, and cloud infrastructure.

Requirements

  • 6+ years in SRE, DevOps, or Infrastructure roles with significant production Kubernetes experience.
  • Hands-on experience integrating AI/LLM tooling into workflows and understanding security considerations.
  • Proven success in building CI/CD pipelines using tools like GitHub Actions or Jenkins.
  • Strong knowledge of Kubernetes internals and managed services like EKS or GKE.
  • Expertise in Infrastructure as Code with Terraform, Helm, or Pulumi.
  • Proficient in programming languages such as Python, Bash, or Go.
  • Familiarity with observability tools like Prometheus or Grafana.
  • Production experience with Kafka, Flink, and ClickHouse.
  • Strong communication and cross-team collaboration skills.

Benefits

  • Competitive salary and stock options.
  • Health benefits and unlimited PTO.
  • Parental leave and tuition reimbursements.

Tech Stack

Apache FlinkApache KafkaArgo CDBashClickHouseDatadogGitHub ActionsGitLab CI/CDGoGrafanaHelmJenkinsKubernetesPrometheusPythonTerraform