Site Reliability Engineer - Storage Engineer

4 months ago

Toronto, Canada

Mid Level / Senior

H1B Sponsor

Responsibilities

Automate and maintain day-to-day operations of storage systems.
Develop and maintain tools and automation scripts for storage operations.
Monitor system performance and implement solutions for high availability.
Participate in agile practices including daily stand-ups and code reviews.
Continuously improve system reliability and performance through monitoring and optimization.

Requirements

2+ years of professional experience with Ceph in a production environment.
2+ years of experience in site reliability engineering or a similar role.
Experience with deployment, configuration, and management of Ceph clusters.
Proficiency in Linux/Unix systems with a focus on automation.
Proficiency in Python or Bash scripting.
Experience with Ansible, Terraform, or SaltStack.
Familiarity with Nagios-based monitoring tools like Icinga2.
Experience with observability tools such as Prometheus and Grafana.
Solid understanding of core networking concepts related to Linux/Unix systems.

Benefits

Paid time off and retirement savings options.
Bonus/incentive eligibility and equity grants.
Participation in employee stock purchase plan.
Competitive health benefits and family-friendly perks including parental leave.
Support for diverse culture and employee resource groups.

Tech Stack

AnsibleAWSBashDockerGrafanaKubernetesLinuxNagiosOpenStackPrometheusPythonTerraform

Categories

BackendDevOps