GrepJob
Reddit

Staff Site Reliability Engineer - Site Experience

Reddit
Apply
about 3 hours ago
Remote, United KingdomStaff+
H1B Sponsor

Responsibilities

  • Lead reliability engineering for user experience by driving operational excellence.
  • Architect systems for scalability and high availability under global load.
  • Identify and mitigate systemic risks and reliability bottlenecks.
  • Automate operational tasks to improve deployment safety and incident response.
  • Manage complex incident responses and ensure long-term fixes are implemented.
  • Define best practices for reliability engineering and operational maturity.
  • Mentor engineers and shape the reliability culture across the organization.

Requirements

  • 8+ years of experience in Site Reliability Engineering or related roles.
  • Strong collaboration and communication skills to influence technical direction.
  • Experience supporting high traffic, user-facing production environments.
  • Deep understanding of distributed systems, networking, and Linux systems.
  • Experience designing highly available systems with strong reliability practices.
  • Strong programming skills in languages such as Go or Python.
  • Understanding of observability systems including metrics and alerting.
  • Ability to troubleshoot complex issues across applications and infrastructure.

Benefits

  • Global benefit programs that fit your lifestyle.
  • Family planning support.
  • Gender-affirming care.
  • Mental health and coaching benefits.
  • Group personal pension scheme with employer match.
  • Private medical and dental scheme.
  • Income replacement programs.
  • Bike to work scheme.
  • Flexible vacation and paid volunteer time off.
  • Generous paid parental leave.

Tech Stack

AmbassadorApache CassandraApache KafkaClickHouseGoGrafanaKubernetesPrometheusPythonRedis

Categories