
Senior Site Reliability Engineer
SecurityScorecardabout 3 hours ago
Base Salary
$152k - $195k/yr
Responsibilities
- Design, build, and scale Kubernetes infrastructure for secure, multi-tenant applications.
- Build and operate AI tooling infrastructure, establishing secure AI access for production systems.
- Optimize and maintain CI/CD pipelines for improved reliability and speed.
- Implement progressive delivery strategies like blue/green and canary deployments.
- Advance Infrastructure as Code with Terraform, Helm, and Argo CD.
- Operate and optimize streaming and analytics infrastructure such as Kafka and Flink.
- Integrate automated testing into the CI/CD lifecycle.
- Define SLOs, alerts, and dashboards to enhance system observability.
- Lead incident response and postmortems to address root causes.
- Mentor engineers on Kubernetes, CI/CD, and cloud infrastructure.
Requirements
- 6+ years in SRE, DevOps, or Infrastructure roles with significant production Kubernetes experience.
- Hands-on experience integrating AI/LLM tooling into workflows and understanding security considerations.
- Proven success in building CI/CD pipelines using tools like GitHub Actions or Jenkins.
- Strong knowledge of Kubernetes internals and managed services like EKS or GKE.
- Expertise in Infrastructure as Code with Terraform, Helm, or Pulumi.
- Proficient in programming languages such as Python, Bash, or Go.
- Familiarity with observability tools like Prometheus or Grafana.
- Production experience with Kafka, Flink, and ClickHouse.
- Strong communication and cross-team collaboration skills.
Benefits
- Competitive salary and stock options.
- Health benefits and unlimited PTO.
- Parental leave and tuition reimbursements.
Tech Stack
Apache FlinkApache KafkaArgo CDBashClickHouseDatadogGitHub ActionsGitLab CI/CDGoGrafanaHelmJenkinsKubernetesPrometheusPythonTerraform