Site Reliability Engineer

8 days ago

Toronto, CanadaMid Level / Senior

H1B Sponsor

Responsibilities

Set, monitor, and enforce SLOs, SLIs, and error budgets.
Lead incident response, troubleshooting, and blameless postmortems.
Build and maintain automated deployment and infrastructure provisioning.
Manage cloud and hybrid infrastructure with Terraform or CloudFormation.
Improve observability through proactive monitoring and alerting.
Partner with development teams to integrate reliability into the SDLC.
Reduce operational toil through automation and self-recovering systems.
Support containerized and serverless workloads for high availability.
Stay current on SRE, cloud, and observability practices.

Requirements

4 to 5 years of AWS production experience.
IaC ownership with Terraform or CloudFormation.
AWS ECS production experience or strong Kubernetes background.
Active on-call rotation experience with incident management.
Working fluency with SLOs, SLIs, and error budgets.

Tech Stack

AnsibleAWSChefDatadogElasticsearchKibanaKubernetesMySQLPostgreSQLPuppetTerraform

Categories

DevOps Security