Staff SRE Engineer
Realtor.com Careersabout 1 month ago
Responsibilities
- Design and maintain highly available AWS infrastructure including EKS clusters and multi-region architectures.
- Own reliability of critical services such as Skyway (CI/CD) and Frontdoor (Tyk).
- Establish SLIs, SLOs, and error budgets for Tier 1/2/3 systems.
- Drive adoption of reliability patterns including circuit breakers and automated failover.
- Build comprehensive observability using NewRelic for rapid troubleshooting.
- Create actionable dashboards and alerts to reduce MTTD and MTTR.
- Analyze infrastructure spend and implement FinOps practices.
- Design chaos engineering experiments to identify system weaknesses.
- Lead game day exercises and disaster recovery simulations.
- Participate in on-call rotation for critical systems and lead post-incident reviews.
- Mentor engineers on incident response and communication.
- Serve as a technical leader and mentor for the Operations Excellence team.
- Support security initiatives including AWS Secrets Manager migration.
Requirements
- 8+ years in Site Reliability Engineering, DevOps, or Infrastructure Engineering.
- Bachelor’s degree or equivalent experience.
- 5+ years hands-on experience with AWS and Kubernetes.
- Strong programming skills in Python, Go, or Java.
- Production experience with observability tools and distributed systems architecture.
- Experience with CI/CD platforms and on-call rotation.
- Preferred experience with chaos engineering tools and API Gateway technologies.
Benefits
- Inclusive and competitive medical, Rx, dental, and vision coverage.
- Family forming benefits.
- 13 paid holidays and flexible time off.
- 8 hours of paid volunteer time off.
- Immediate eligibility into Company 401(k) plan with 3.5% company match.
- Tuition reimbursement program for degree and non-degreed programs.
- 1:1 personalized financial planning sessions.
- Student debt retirement savings match program.
- Free snacks and refreshments in each office location.