GrepJob
Heidi

Senior Site Reliability Engineer

Heidi
Apply
about 1 month ago
Melbourne, Australia or London, United KingdomMid Level / Senior
H1B Sponsor

Responsibilities

  • Participate in on-call and incident response, leading incidents over time.
  • Identify and drive fixes for recurring issues and reliability risks.
  • Operate and improve Kubernetes clusters and cloud infrastructure.
  • Enhance observability through improved dashboards and alerts.
  • Automate repetitive tasks and simplify operational processes.
  • Support safe change with improved deployment and rollback mechanisms.
  • Write and maintain runbooks and participate in post-mortems.
  • Collaborate with engineers to improve service reliability.

Requirements

  • 3–6+ years in SRE, DevOps, or operations-heavy engineering roles.
  • Experience supporting production systems and on-call rotations.
  • Comfortable debugging live systems under pressure.
  • Experience operating cloud infrastructure, preferably AWS.
  • Working knowledge of Kubernetes and containerized workloads.
  • Experience with Infrastructure as Code tools like Terraform.
  • Familiarity with monitoring and alerting tools such as Datadog.
  • Scripting or automation experience in Python or Bash.

Benefits

  • Equity from day one, sharing in the company's success.
  • Personal development budget and wellness days.
  • Flexible hybrid work environment with 3 days in the office.
  • Opportunity to work alongside world-class talent.
  • Impactful role in shaping international expansion.

Tech Stack

AWSBashDatadogKubernetesPrometheusPythonTerraform

Categories