about 4 hours ago
Base Salary
$196k - $230k/yr
Responsibilities
- Drive the long-term reliability and observability strategy across Robinhood’s infrastructure.
- Collaborate with engineers to enhance operational excellence and incident response.
- Lead incident mitigation efforts and facilitate time-sensitive decisions during incidents.
- Develop and maintain incident management processes to minimize customer impact.
- Own incident discovery by defining and maintaining global dashboards and alerts.
- Evolve incident response tooling and processes, focusing on MTTD/MTTR improvements.
- Drive post-incident governance and learning to ensure reliability improvements.
- Design next-generation failure mitigation strategies.
- Define frameworks to improve monitoring and observability across services.
- Deliver insights and executive-level reporting on service quality and reliability.
- Mentor and contribute to hiring and engineering culture.
Requirements
- 5+ years of software engineering experience with production systems.
- 2+ years focused on reliability engineering or production operations.
- Hands-on experience in incident leadership roles.
- Strong communication and collaboration skills during high-severity incidents.
- Deep knowledge of systems reliability and fault-tolerant architecture design.
- Experience with multi-region architectures and failover strategies.
- Familiarity with modern observability stacks like OpenTelemetry and Grafana.
- Proven ability to drive improvements in MTTD, MTTR, and service availability.
Benefits
- Challenging, high-impact work to grow your career.
- Performance-driven compensation with bonuses and equity ownership.
- 100% paid health insurance for employees and 90% for dependents.
- Flexible benefits spending account for wellness and learning.
- Employer-paid life and disability insurance, fertility, and mental health benefits.
- Time off for company holidays, paid time off, sick leave, and parental leave.
- Exceptional office experience with catered meals and comfortable workspaces.