GrepJob
Florence Healthcare - US

Site Reliability Engineer (SRE)

Florence Healthcare - US
Apply
2 days ago
Atlanta, GA, USAMid Level / Senior

Responsibilities

  • Participate as an embedded member of a Scrum team in planning and reviews.
  • Use AI-powered tools to enhance system reliability and operational efficiency.
  • Design, build, and operate reliable cloud infrastructure.
  • Apply AI-assisted analysis to monitoring and observability data.
  • Define and maintain SLOs, SLIs, and error budgets.
  • Collaborate with software engineers to embed reliability into the development lifecycle.
  • Lead incident response and root cause analysis efforts.
  • Automate operational tasks through AI-enabled and traditional methods.
  • Contribute to disaster recovery planning and operational readiness.
  • Produce and maintain documentation such as runbooks and system diagrams.

Requirements

  • Passionate about building reliable, scalable systems using AI-enabled approaches.
  • Strong understanding of cloud-native and distributed system architectures.
  • Experience applying SRE principles in a production environment.
  • Hands-on experience with cloud platforms, preferably AWS.
  • Experience using AI-assisted tools for coding and operational analysis.
  • Strong background in Linux, networking, and system operations.
  • Experience with infrastructure-as-code and automation tools like Terraform.
  • Familiarity with modern observability practices, including AI-enhanced analysis.
  • Comfortable working in an agile, cross-functional Scrum team.
  • Strong problem-solving, communication, and collaboration skills.
  • 4+ years of experience in SRE, DevOps, or similar roles.
  • Experience supporting production systems at scale.

Benefits

  • Competitive compensation package, including medical and dental insurance.
  • Office space located in the heart of the city.
  • Opportunity to work on impactful health technology projects.

Categories