about 3 hours ago
Responsibilities
- Lead monitoring and alerting improvements in production environments.
- Reduce the rate of unactionable alerts from 50%.
- Remediate noisy alerts to enhance operational efficiency.
- Develop severity classes and alert playbooks for incident management.
- Utilize various AWS tools to support site reliability initiatives.
Requirements
- Extensive experience with monitoring, alerting, and troubleshooting in production environments.
- Proficiency with tools like Splunk, DataDog, and Service Now.
- Experience in reducing unactionable alerts and remediating noisy alerts.
- Ability to develop severity classes and alert playbooks.
- Bachelor’s Degree or equivalent experience in technology.
Benefits
- Competitive compensation package.
- Professional development opportunities.
- Flexible work arrangements.
- Supportive culture of collaboration and continuous learning.
