Klaviyo

Senior SRE, Site Reliability Engineer

Klaviyo

Apply
27 days ago
Dublin, Ireland
Senior
H1B Sponsor

Responsibilities

  • Build and operate foundational, security-critical services with a focus on availability and fault tolerance.
  • Automate infrastructure to reduce operational toil and improve system reliability.
  • Design and implement systems using SRE best practices.
  • Define and refine SLIs, SLOs, and error budgets.
  • Enhance observability, alerting, and incident response.
  • Participate in on-call rotations with a focus on sustainable operations.
  • Conduct quantitative analysis to understand system behavior and capacity constraints.
  • Identify systemic risks and drive long-term solutions.
  • Collaborate with product, platform, and security engineers.
  • Mentor and pair with other engineers to improve operational maturity.

Requirements

  • Experience writing and maintaining production-quality code in languages like Python or Go.
  • Proven experience with distributed, cloud-native systems and understanding of failure modes.
  • Hands-on experience with containerized workloads and platforms such as Kubernetes.
  • Comfortable with on-call rotations and diagnosing production issues.
  • Experience designing and operating observability systems.
  • Familiarity with SRE concepts like SLIs, SLOs, and error budgets.
  • Experience with infrastructure as code tools like Terraform.
  • Experience in capacity planning, load testing, and performance analysis.
  • Contribute to post-incident reviews and drive follow-up actions.
  • Comfortable reviewing technical designs and system documentation.

Tech Stack

Apache KafkaAWSDjangoFastAPIKubernetesMySQLPythonRabbitMQReactRedisTerraform

Categories

DevOpsSecurity