Senior Site Reliability Engineer

3 months ago

Berlin, GermanySenior

Responsibilities

Build out a self-service runtime platform for engineering teams.
Integrate software development practices into platform engineering.
Lead the overhaul of CI/CD pipelines in collaboration with product teams.
Ensure site reliability through observability and disaster recovery solutions.
Define and operate reliability standards through SLOs and error budgets.
Drive infrastructure cost optimization across various technologies.
Improve security posture through tooling and compliance work.
Collaborate with engineering teams on platform architecture.
Enhance developer productivity with platform services and tooling.
Serve as a secondary on-call for incident response.

Requirements

5+ years in backend or infrastructure engineering, with 2 years in SRE or platform engineering.
Hands-on experience with GCP/AWS, Kubernetes, Terraform, and Helm in production.
Strong software development background in building frameworks and internal tooling.
Experience with observability platforms like Datadog at scale.
Proficient in defining and operating SLOs and error budgets.
Solid understanding of Infrastructure as Code (IaC) and GitOps.
Proven track record in designing and troubleshooting complex distributed systems.

Tech Stack

AWSDatadogGoogle Cloud PlatformHelmKubernetesMongoDBTerraform TypeScript

Categories

Backend DevOps Security