about 4 hours ago
Washington, DC, USAMid Level / Senior
Responsibilities
- Proactively notify of potential and actual issues impacting service delivery.
- Communicate frequently and succinctly with leadership during and post-incident.
- Identify trends and implement corrective measures.
- Provide metrics to leadership for performance assessment.
- Build monitoring and production support solutions for customer visibility.
- Triage and resolve production incidents related to the cloud platform.
- Lead implementation of solutions and automate manual processes.
- Participate in the creation and maintenance of technical documentation.
- Troubleshoot production issues and develop technical solutions.
- Collaborate with IT and business teams to streamline production support processes.
Requirements
- Experience in Production Monitoring & Support within a 24x7x365 operational environment.
- Strong expertise in incident management, root cause analysis, and problem resolution for cloud-based applications.
- Hands-on experience with Amazon Web Services (AWS) and cloud-based monitoring tools.
- Proficiency in ITIL processes and managing ITIL engineers.
- Ability to build and implement monitoring solutions and automate processes.
- Experience with system health monitoring and performance optimization.
- Strong leadership skills for collaboration with IT and business teams.
- Effective communication skills for updates and incident reporting.
- Ability to develop and maintain technical documentation.
- Experience in triaging and resolving production incidents.
Benefits
- Highly competitive benefits and professional development opportunities.
- A culture that embraces flexibility, innovation, and collaboration.
- Commitment to employee well-being and personal goals.
