Senior Site Reliability Engineer, Wikimedia Enterprise

about 2 months ago

Remote, WorldwideSenior

H1B Sponsor

Base Salary

$117k - $181k/yr

Responsibilities

Define, track, and improve Service Level Objectives (SLOs), SLIs, and error budgets.
Build and enhance observability systems for proactive detection and troubleshooting.
Drive reliability engineering practices, including capacity planning and load testing.
Improve developer experience by enabling self-service infrastructure.
Partner with engineering teams to embed reliability best practices early in development.
Design and optimize CI/CD and GitOps workflows for automated deployments.
Implement secure-by-default infrastructure and enforce best practices.
Continuously optimize infrastructure cost and efficiency using FinOps principles.
Establish and track operational metrics to drive continuous improvement.
Reduce operational toil by implementing automation-first solutions.
Contribute to and evolve internal platform capabilities for scalability.
Collaborate with a globally distributed team.
Mentor peers in technical and operational areas.

Experience with Infrastructure as Code and automation tools like Terraform or Ansible.
Proficiency in at least one programming language such as Python or Go.
Experience designing and operating cloud-based systems on platforms like AWS, Azure, or GCP.
Familiarity with CI/CD pipelines and GitOps workflows.
Experience with incident response and leading postmortems.
Strong understanding of SRE best practices, including SLOs and observability.
Ability to work effectively in a distributed, cross-functional environment.
Familiarity with Wikimedia or other open source projects is a plus.