about 2 hours ago
Responsibilities
- Lead reliability initiatives across multiple Ads domains including ad serving and reporting.
- Partner with engineering leadership to improve reliability and scalability.
- Drive architecture reviews and influence technical decisions.
- Design and build platforms and automation to enhance reliability.
- Participate in on-call rotations and lead incident investigations.
- Identify systemic reliability risks and implement long-term solutions.
- Establish reliability metrics for critical user journeys.
- Mentor engineers and provide technical leadership.
Requirements
- 8+ years of experience in Site Reliability Engineering or related roles.
- Strong experience in high traffic, user-facing production environments.
- Deep understanding of distributed systems and cloud native architectures.
- Experience designing highly available systems with strong operational practices.
- Strong understanding of observability systems including metrics and alerting.
- Good programming skills in languages such as Go or Python.
- Experience improving reliability through SLOs and automation.
- Demonstrated ability to troubleshoot complex issues in distributed systems.
Benefits
- Global Benefit programs that fit your lifestyle.
- Family Planning Support.
- Gender-Affirming Care.
- Mental Health & Coaching Benefits.
- Private Medical, Dental, and Vision Benefits.
- Personal Retirement Savings Account with matching contribution.
- Cycle to Work and Tax Saver schemes.
- Flexible Vacation & Paid Volunteer Time Off.
- Generous Paid Parental Leave.
