about 5 hours ago
Base Salary
$149k - $186k/yr
Responsibilities
- Contribute to the observability strategy and roadmap.
- Design and enhance scalable observability solutions.
- Establish best practices for monitoring, alerting, and incident management.
- Support operational excellence by improving incident response processes.
- Collaborate on cross-team initiatives to improve system reliability.
- Apply automation and AI-assisted workflows for root cause analysis.
- Work with stakeholders to surface observability insights.
- Analyze system and user signals to mitigate reliability issues.
- Optimize observability platforms for performance and cost-efficiency.
- Mentor peers and raise observability standards within the team.
Requirements
- Solid hands-on experience in observability engineering or related roles.
- Strong expertise in monitoring and observability practices.
- Experience with observability or reliability initiatives across teams.
- Proficiency with Kubernetes and cloud infrastructure (e.g., AWS).
- Ability to influence technical decisions and collaborate with stakeholders.
- Good understanding of distributed systems principles.
- Experience defining and implementing SLOs, SLIs, and alerting strategies.
- Strong software engineering fundamentals in at least one modern programming language.
- Experience improving systems through automation.
- Strong analytical and problem-solving skills.
- Good communication and collaboration skills.
- A sense of ownership and accountability.
Benefits
- Array of health plans including mental health support and fitness benefits.
- Generous paid time off (PTO) and sick leave.
- Annual bonus and long-term incentive opportunities.
- 401k with up to a 5% match.
- Commuter benefits and pet insurance.
