Robinhood

Staff Reliability Engineer - Robinhood Command Center

Robinhood

Apply
28 days ago
New York, NY, USA
Staff+
H1B Sponsor

Base Salary

$169k - $255k/yr

Responsibilities

  • Serve as a senior technical leader driving reliability and observability strategy.
  • Partner with engineers to enhance operational excellence and incident response.
  • Lead incident mitigation efforts and facilitate time-sensitive decisions.
  • Develop and maintain incident management processes for timely resolution.
  • Own incident discovery by defining global dashboards and alerts.
  • Evolve incident response tooling and processes, measuring improvements.
  • Drive post-incident governance and define standards for reviews.
  • Design next-generation failure mitigation strategies.
  • Define frameworks to improve monitoring and observability.
  • Deliver insights and executive-level reporting for service quality.
  • Mentor and contribute to hiring and engineering culture.

Requirements

  • 8+ years of software engineering experience with production systems.
  • 4+ years focused on reliability engineering or production operations.
  • Hands-on experience in incident leadership roles.
  • Strong communication and collaboration skills during incidents.
  • Deep knowledge of systems reliability and fault-tolerant architecture.
  • Experience with multi-region architectures and failover strategies.
  • Familiarity with modern observability stacks like OpenTelemetry and Grafana.
  • Proven ability to drive measurable improvements in reliability metrics.

Benefits

  • Challenging, high-impact work to grow your career.
  • Performance-driven compensation with bonuses and equity ownership.
  • 100% paid health insurance for employees and 90% for dependents.
  • Flexible benefits spending account for wellness and learning.
  • Employer-paid life and disability insurance, fertility, and mental health benefits.
  • Time off for company holidays, paid time off, and parental leave.
  • Exceptional office experience with catered meals and events.

Tech Stack

GrafanaPrometheus

Categories

DevOpsSecurity