GrepJob
Ardent

Reliability Engineer

Ardent
Apply
about 4 hours ago
Washington, DC, USAMid Level / Senior

Responsibilities

  • Proactively notify of potential and actual issues impacting service delivery.
  • Communicate frequently and succinctly with leadership during and post-incident.
  • Identify trends and implement corrective measures.
  • Provide metrics to leadership for performance assessment.
  • Build monitoring and production support solutions for customer visibility.
  • Triage and resolve production incidents related to the cloud platform.
  • Lead implementation of solutions and automate manual processes.
  • Participate in the creation and maintenance of technical documentation.
  • Troubleshoot production issues and develop technical solutions.
  • Collaborate with IT and business teams to streamline production support processes.

Requirements

  • Experience in Production Monitoring & Support within a 24x7x365 operational environment.
  • Strong expertise in incident management, root cause analysis, and problem resolution for cloud-based applications.
  • Hands-on experience with Amazon Web Services (AWS) and cloud-based monitoring tools.
  • Proficiency in ITIL processes and managing ITIL engineers.
  • Ability to build and implement monitoring solutions and automate processes.
  • Experience with system health monitoring and performance optimization.
  • Strong leadership skills for collaboration with IT and business teams.
  • Effective communication skills for updates and incident reporting.
  • Ability to develop and maintain technical documentation.
  • Experience in triaging and resolving production incidents.

Benefits

  • Highly competitive benefits and professional development opportunities.
  • A culture that embraces flexibility, innovation, and collaboration.
  • Commitment to employee well-being and personal goals.

Tech Stack

Categories