GrepJob
Kaseya

Site Reliability Engineer

Kaseya
Apply
8 days ago
Toronto, CanadaMid Level / Senior
H1B Sponsor

Responsibilities

  • Set, monitor, and enforce SLOs, SLIs, and error budgets.
  • Lead incident response, troubleshooting, and blameless postmortems.
  • Build and maintain automated deployment and infrastructure provisioning.
  • Manage cloud and hybrid infrastructure with Terraform or CloudFormation.
  • Improve observability through proactive monitoring and alerting.
  • Partner with development teams to integrate reliability into the SDLC.
  • Reduce operational toil through automation and self-recovering systems.
  • Support containerized and serverless workloads for high availability.
  • Stay current on SRE, cloud, and observability practices.

Requirements

  • 4 to 5 years of AWS production experience.
  • IaC ownership with Terraform or CloudFormation.
  • AWS ECS production experience or strong Kubernetes background.
  • Active on-call rotation experience with incident management.
  • Working fluency with SLOs, SLIs, and error budgets.

Tech Stack

AnsibleAWSChefDatadogElasticsearchKibanaKubernetesMySQLPostgreSQLPuppetTerraform

Categories