Sr. Site Reliability Engineer

about 2 months ago

Toronto, CanadaSenior

Responsibilities

Design, implement, and maintain highly available and scalable infrastructure solutions.
Monitor and analyze system performance to identify and resolve bottlenecks.
Automate infrastructure deployment and configuration management processes.
Continuously improve system reliability, security, and efficiency.
Troubleshoot and resolve complex infrastructure and application issues.
Collaborate with software engineering teams to design resilient systems.
Participate in on-call rotation and respond to production incidents.
Document system configurations and operational guidelines.

Proven experience as a Site Reliability Engineer or in a similar role.
Strong understanding of networking, operating systems, and cloud infrastructure.
Experience with Site Reliability Engineering, System Design, and Distributed Computing.
Proficiency in programming languages such as NodeJS, Java, Python, Ruby, and Go.
Experience with containerization technologies like Docker and Kubernetes.
Knowledge of infrastructure-as-code tools like Terraform and Pulumi.
Familiarity with monitoring and logging tools such as Prometheus and Grafana.
Experience with relational databases and distributed SQL databases is a bonus.
Experience working with Git and GitHub.
Strong problem-solving and troubleshooting skills.
Excellent communication and collaboration abilities.

Opportunity to work with cutting-edge technology in a rapidly growing sector.
A supported environment where your ideas lead to real impact.
Competitive salary based on experience.
Stock options at an early-stage startup.
Comprehensive benefits including healthcare and other insurance.
A full remote and flexible schedule to accommodate different timezones.
Twice-yearly travel for team offsites focused on team bonding and collaboration.