1 day ago
Responsibilities
- Partner with Ads Engineering teams to enhance ad-serving system reliability and scalability.
- Design, build, and maintain infrastructure and automation for service reliability.
- Improve observability through monitoring, alerting, and logging.
- Participate in on-call rotations and lead incident response for production systems.
- Conduct root cause analysis and implement corrective actions post-incident.
- Collaborate with software engineers throughout the service lifecycle.
- Drive adoption of SRE best practices including SLIs and SLOs.
- Reduce operational toil through automation and self-service tooling.
- Define and measure advertiser-critical user journeys.
- Scale Ads systems to meet traffic growth and business requirements.
Requirements
- 5+ years of experience in Site Reliability Engineering or related roles.
- Strong experience in high traffic, user-facing production environments.
- Good understanding of distributed systems, networking, and cloud architectures.
- Proficient in programming languages such as Go or Python.
- Ability to troubleshoot complex issues across applications and infrastructure.
- Experience with observability platforms and incident response.
- Proven track record in driving automation and operational improvements.
Benefits
- Global benefit programs tailored to your lifestyle.
- Family planning support and gender-affirming care.
- Mental health and coaching benefits.
- Employer-matching private pension plan.
- 100% employer-sponsored group medical plan.
- Income replacement programs.
- Flexible vacation and paid volunteer time off.
- Generous paid parental leave.
