
Site Reliability Engineer
Rainforest3 months ago
Atlanta, GA, USAMid Level / Senior
Responsibilities
- Own and scale AWS-based cloud infrastructure using Terraform and IaC orchestration.
- Build and operate Elastic Kubernetes Service (EKS) and serverless environments for core payments services.
- Design and maintain CI/CD pipelines with GitLab for fast, safe deployments.
- Implement monitoring and observability tools to ensure high uptime and quick incident resolution.
- Automate infrastructure and operational processes to eliminate manual work.
- Collaborate with application engineers to improve system performance and reliability.
- Lead incident response efforts and conduct postmortems for continuous improvement.
- Define and roll out SRE best practices as the company scales.
- Optimize for cost, security, and compliance in a regulated fintech environment.
- Support and scale Postgres database infrastructure using AWS RDS offerings.
Requirements
- 3+ years of experience in SRE, DevOps, or cloud infrastructure roles, preferably in a startup or high-growth environment.
- Strong hands-on experience with cloud infrastructure (AWS, Google Cloud, Azure).
- Deep experience with IaC using tools such as Terraform, OpenTofu, Terragrunt, and CloudFormation.
- Solid production experience with container orchestration (Kubernetes, ECS).
- Experience building CI/CD pipelines using tools like GitLab and GitHub Actions.
- Strong understanding of monitoring and observability principles and design.
- Proficiency in at least one modern programming language (e.g., Python, Java, Go, or Ruby).
- Bachelor’s degree or equivalent work experience in Information Science, Computer Science, or related disciplines is preferred.
Benefits
- Comprehensive health benefits package.
- Unlimited paid time off.
- Paid parental leave.
- Fun and flexible working environment.
- Continuous investment in employee development and company culture.