GrepJob
Klaviyo

Senior Software Engineer, Reliability

Klaviyo
Apply
about 5 hours ago
Dublin, IrelandSenior
H1B Sponsor

Responsibilities

  • Build and operate foundational, security-critical services with a focus on availability and fault tolerance.
  • Automate infrastructure to reduce operational toil and improve system reliability.
  • Design and implement systems using SRE best practices.
  • Define and refine SLIs, SLOs, and error budgets.
  • Enhance observability, alerting, and incident response.
  • Participate in on-call rotations with a focus on sustainable operations.
  • Conduct quantitative analysis to understand system behavior and capacity constraints.
  • Identify systemic risks and drive long-term solutions.
  • Collaborate with product, platform, and security engineers.
  • Mentor and pair with other engineers to improve operational maturity.

Requirements

  • Proficient in writing production-quality code (e.g., Python, Go).
  • Experience with distributed, cloud-native systems and understanding of failure modes.
  • Familiarity with containerized workloads and platforms (e.g., Kubernetes).
  • Comfortable with on-call rotations and diagnosing production issues.
  • Experience designing and operating observability systems.
  • Knowledge of SRE concepts such as SLIs, SLOs, and error budgets.
  • Hands-on experience with infrastructure as code (e.g., Terraform).
  • Experience with capacity planning and performance analysis.
  • Ability to contribute to post-incident reviews.
  • Interest in experimenting with AI tools and workflows.

Tech Stack

Apache KafkaAWSDjangoFastAPIKubernetesMySQLPythonRabbitMQReactRedisTerraform

Categories