about 3 hours ago
Responsibilities
- Partner with Ads Engineering teams to enhance ad-serving system reliability and scalability.
- Design, build, and maintain infrastructure and automation for service reliability.
- Improve observability through monitoring, alerting, and logging.
- Participate in on-call rotations and lead incident response for production systems.
- Conduct root cause analysis and implement corrective actions post-incident.
- Collaborate with software engineers throughout the service lifecycle.
- Promote SRE best practices including SLIs, SLOs, and capacity planning.
- Automate processes to reduce operational toil.
- Define and measure advertiser-critical user journeys.
- Scale Ads systems to meet growing traffic and advertiser demand.
Requirements
- 5+ years of experience in Site Reliability Engineering or related roles.
- Strong experience in high traffic, user-facing production environments.
- Good understanding of distributed systems, networking, and cloud architectures.
- Proficient in programming languages such as Go or Python.
- Ability to troubleshoot complex issues across applications and infrastructure.
- Experience with observability platforms and incident response.
- Proven track record in driving automation and operational improvements.
Benefits
- Global benefit programs tailored to your lifestyle.
- Family planning support and gender-affirming care.
- Mental health and coaching benefits.
- Group personal pension scheme with employer match.
- Private medical and dental scheme.
- Income replacement programs.
- Bike to work scheme.
- Flexible vacation and paid volunteer time off.
- Generous paid parental leave.
