Staff Reliability Engineer - Robinhood Command Center
Robinhood
28 days ago
New York, NY, USA
Staff+
H1B Sponsor
Base Salary
$169k - $255k/yr
Responsibilities
- Serve as a senior technical leader driving reliability and observability strategy.
- Partner with engineers to enhance operational excellence and incident response.
- Lead incident mitigation efforts and facilitate time-sensitive decisions.
- Develop and maintain incident management processes for timely resolution.
- Own incident discovery by defining global dashboards and alerts.
- Evolve incident response tooling and processes, measuring improvements.
- Drive post-incident governance and define standards for reviews.
- Design next-generation failure mitigation strategies.
- Define frameworks to improve monitoring and observability.
- Deliver insights and executive-level reporting for service quality.
- Mentor and contribute to hiring and engineering culture.
Requirements
- 8+ years of software engineering experience with production systems.
- 4+ years focused on reliability engineering or production operations.
- Hands-on experience in incident leadership roles.
- Strong communication and collaboration skills during incidents.
- Deep knowledge of systems reliability and fault-tolerant architecture.
- Experience with multi-region architectures and failover strategies.
- Familiarity with modern observability stacks like OpenTelemetry and Grafana.
- Proven ability to drive measurable improvements in reliability metrics.
Benefits
- Challenging, high-impact work to grow your career.
- Performance-driven compensation with bonuses and equity ownership.
- 100% paid health insurance for employees and 90% for dependents.
- Flexible benefits spending account for wellness and learning.
- Employer-paid life and disability insurance, fertility, and mental health benefits.
- Time off for company holidays, paid time off, and parental leave.
- Exceptional office experience with catered meals and events.
Tech Stack
GrafanaPrometheus
Categories
DevOpsSecurity