about 5 hours ago
Base Salary
$141k - $217k/yr
Responsibilities
- Design and improve observability, monitoring, alerting, and incident response across the Unified Call platform.
- Build and mature deployment processes that improve reliability and reduce operational risk.
- Partner closely with Platform Engineering and the Unified Call engineering team to improve system resiliency and uptime.
- Analyze system architecture, identify operational risks, and implement improvements before they become customer-impacting incidents.
- Develop dashboards, automation, and operational tooling that enable engineers to confidently operate production systems.
- Help establish Site Reliability Engineering best practices as the organization continues to scale.
Requirements
- 5+ years of experience as a Site Reliability Engineer, Infrastructure Engineer, Platform Engineer, or similar role.
- Strong experience with observability platforms, preferably Datadog.
- Experience supporting Kubernetes-based cloud infrastructure, preferably AWS.
- Experience designing monitoring, alerting, deployments, and operational automation for production systems.
- Experience working with message queues, event-driven architectures, or messaging platforms such as RabbitMQ or Kafka.
- Working knowledge of cloud networking fundamentals.
Benefits
- Competitive salary and 401k with employer match.
- Discretionary time off.
- Paid parental leave for all.
- Medical, Dental, Vision plans.
- Fitness Programs.
- Emotional & Development Programs.
- Snacks available in the offices.
