Senior SRE, Site Reliability Engineer
Klaviyo
27 days ago
Dublin, Ireland
Senior
H1B Sponsor
Responsibilities
- Build and operate foundational, security-critical services with a focus on availability and fault tolerance.
- Automate infrastructure to reduce operational toil and improve system reliability.
- Design and implement systems using SRE best practices.
- Define and refine SLIs, SLOs, and error budgets.
- Enhance observability, alerting, and incident response.
- Participate in on-call rotations with a focus on sustainable operations.
- Conduct quantitative analysis to understand system behavior and capacity constraints.
- Identify systemic risks and drive long-term solutions.
- Collaborate with product, platform, and security engineers.
- Mentor and pair with other engineers to improve operational maturity.
Requirements
- Experience writing and maintaining production-quality code in languages like Python or Go.
- Proven experience with distributed, cloud-native systems and understanding of failure modes.
- Hands-on experience with containerized workloads and platforms such as Kubernetes.
- Comfortable with on-call rotations and diagnosing production issues.
- Experience designing and operating observability systems.
- Familiarity with SRE concepts like SLIs, SLOs, and error budgets.
- Experience with infrastructure as code tools like Terraform.
- Experience in capacity planning, load testing, and performance analysis.
- Contribute to post-incident reviews and drive follow-up actions.
- Comfortable reviewing technical designs and system documentation.
Tech Stack
Apache KafkaAWSDjangoFastAPIKubernetesMySQLPythonRabbitMQReactRedisTerraform
Categories
DevOpsSecurity