Site Reliability Engineer

3 months ago

Remote, United KingdomMid Level / Senior

H1B Sponsor

Responsibilities

Act as a primary or escalation responder in a 24x7 on-call rotation.
Lead or support Major Incident (MI) response, including triage, mitigation, and resolution.
Coordinate across Engineering, Infrastructure, Security, and Product teams.
Execute and improve runbooks, playbooks, and escalation paths.
Drive blameless post-incident reviews (PIRs) and track corrective actions.
Own service health monitoring across infrastructure, applications, and dependencies.
Design and maintain alerting strategies that align with SLIs/SLOs.
Reduce alert fatigue through signal-to-noise improvements.
Build dashboards using tools such as Grafana, Prometheus, Datadog, Splunk, and CloudWatch.
Automate repetitive operational tasks to reduce manual toil.
Improve mean time to detect (MTTD) and mean time to resolve (MTTR).
Develop scripts and tools in Python, Bash, Go, or similar to support NOC/SRE workflows.
Implement self-healing and auto-remediation where possible.
Partner with engineering teams to improve system design for reliability.
Support and troubleshoot Linux-based systems, cloud platforms, and Kubernetes environments.
Assist with capacity planning and availability reviews.
Ensure operational readiness for production releases.

Strong Linux systems administration skills.
Experience with incident management and production support.
Familiarity with cloud infrastructure, preferably AWS.
Experience with containers and orchestration tools like Docker and Kubernetes.
Knowledge of monitoring and alerting platforms.
Scripting or programming experience in Python, Bash, Go, or similar.
Understanding of networking fundamentals such as DNS, TCP/IP, and load balancing.
Experience working in 24x7 NOC or production operations environments.
Ability to handle high-pressure incidents calmly and effectively.
Strong written and verbal communication skills for incident coordination.
Comfort working from runbooks and improving them when necessary.