Site Reliability Engineer

4 months ago

Atlanta, GA, USAMid Level / Senior

Responsibilities

Own and scale AWS-based cloud infrastructure using Terraform and IaC orchestration.
Build and operate Elastic Kubernetes Service (EKS) and serverless environments for core payments services.
Design and maintain CI/CD pipelines with GitLab for fast, safe deployments.
Implement monitoring and observability tools to ensure high uptime and quick incident resolution.
Automate infrastructure and operational processes to eliminate manual work.
Collaborate with application engineers to improve system performance and reliability.
Lead incident response efforts and conduct postmortems for continuous improvement.
Define and roll out SRE best practices as the company scales.
Optimize for cost, security, and compliance in a regulated fintech environment.
Support and scale Postgres database infrastructure using AWS RDS offerings.

3+ years of experience in SRE, DevOps, or cloud infrastructure roles, preferably in a startup or high-growth environment.
Strong hands-on experience with cloud infrastructure (AWS, Google Cloud, Azure).
Deep experience with IaC using tools such as Terraform, OpenTofu, Terragrunt, and CloudFormation.
Solid production experience with container orchestration (Kubernetes, ECS).
Experience building CI/CD pipelines using tools like GitLab and GitHub Actions.
Strong understanding of monitoring and observability principles and design.
Proficiency in at least one modern programming language (e.g., Python, Java, Go, or Ruby).
Bachelor’s degree or equivalent work experience in Information Science, Computer Science, or related disciplines is preferred.