about 2 hours ago
Toronto, CanadaSenior
Responsibilities
- Design, implement, and maintain highly available and scalable infrastructure solutions.
- Monitor and analyze system performance to identify and resolve bottlenecks.
- Automate infrastructure deployment and configuration management processes.
- Continuously improve system reliability, security, and efficiency.
- Troubleshoot and resolve complex infrastructure and application issues.
- Collaborate with software engineering teams to design resilient systems.
- Participate in on-call rotation and respond to production incidents.
- Document system configurations and operational guidelines.
Requirements
- Proven experience as a Site Reliability Engineer or in a similar role.
- Strong understanding of networking, operating systems, and cloud infrastructure.
- Experience with Site Reliability Engineering, System Design, and Distributed Computing.
- Proficiency in programming languages such as NodeJS, Java, Python, Ruby, and Go.
- Experience with containerization technologies like Docker and Kubernetes.
- Knowledge of infrastructure-as-code tools like Terraform and Pulumi.
- Familiarity with monitoring and logging tools such as Prometheus and Grafana.
- Experience with relational databases and distributed SQL databases is a bonus.
- Experience working with Git and GitHub.
- Strong problem-solving and troubleshooting skills.
- Excellent communication and collaboration abilities.
Benefits
- Opportunity to work with cutting-edge technology in a rapidly growing sector.
- A supported environment where your ideas lead to real impact.
- Competitive salary based on experience.
- Stock options at an early-stage startup.
- Comprehensive benefits including healthcare and other insurance.
- A full remote and flexible schedule to accommodate different timezones.
- Twice-yearly travel for team offsites focused on team bonding and collaboration.
