Staff Platform Site Reliability Specialist (Observability & Kubernetes) (copy)
Everbridge
about 21 hours ago
Montréal, Canada
Staff+
H1B Sponsor
Responsibilities
- Own the design, operation, and evolution of the observability stack.
- Build and maintain a highly available, scalable observability platform.
- Standardize instrumentation, dashboards, alerts, and SLOs.
- Support incident response, root cause analysis, and capacity planning.
- Operate and scale Grafana and its associated technologies.
- Maintain reliability and security of EKS clusters.
- Manage cluster lifecycle and upgrades.
- Utilize Terraform for infrastructure provisioning.
Requirements
- 6+ years of experience in SRE or Platform Engineering.
- Strong experience with the Grafana ecosystem.
- Expertise in Kubernetes and Amazon EKS.
- Proficiency in Terraform.
Benefits
- Comprehensive healthcare and dental care.
- Mental health benefits.
- Disability income benefits.
- Life and AD&D insurance.
- Retirement savings plan with employer match.
- Paid time off.
Tech Stack
AWSGitLab CI/CDGoogle Cloud PlatformGrafanaKubernetesTerraform
Categories
DevOpsSecurity