
Site Reliability Engineer (SRE)
Florence Healthcare - US2 days ago
Atlanta, GA, USAMid Level / Senior
Responsibilities
- Participate as an embedded member of a Scrum team in planning and reviews.
- Use AI-powered tools to enhance system reliability and operational efficiency.
- Design, build, and operate reliable cloud infrastructure.
- Apply AI-assisted analysis to monitoring and observability data.
- Define and maintain SLOs, SLIs, and error budgets.
- Collaborate with software engineers to embed reliability into the development lifecycle.
- Lead incident response and root cause analysis efforts.
- Automate operational tasks through AI-enabled and traditional methods.
- Contribute to disaster recovery planning and operational readiness.
- Produce and maintain documentation such as runbooks and system diagrams.
Requirements
- Passionate about building reliable, scalable systems using AI-enabled approaches.
- Strong understanding of cloud-native and distributed system architectures.
- Experience applying SRE principles in a production environment.
- Hands-on experience with cloud platforms, preferably AWS.
- Experience using AI-assisted tools for coding and operational analysis.
- Strong background in Linux, networking, and system operations.
- Experience with infrastructure-as-code and automation tools like Terraform.
- Familiarity with modern observability practices, including AI-enhanced analysis.
- Comfortable working in an agile, cross-functional Scrum team.
- Strong problem-solving, communication, and collaboration skills.
- 4+ years of experience in SRE, DevOps, or similar roles.
- Experience supporting production systems at scale.
Benefits
- Competitive compensation package, including medical and dental insurance.
- Office space located in the heart of the city.
- Opportunity to work on impactful health technology projects.