
Senior Site Reliability Engineer (Remote)
Pragmatike1 day ago
Tallinn, Estonia +5 moreSenior
Responsibilities
- Operate and maintain Linux-based infrastructure (Debian/Ubuntu).
- Deploy, manage, and scale Kubernetes clusters across various environments.
- Oversee full cluster lifecycle including upgrades and security hardening.
- Implement automation for provisioning and operations using Ansible and GitOps workflows.
- Design and maintain networking architecture including VLANs and VPNs.
- Build automated deployment workflows and maintain observability stacks.
- Lead incident response and improve system availability.
- Define and implement SLOs/SLIs at multiple infrastructure levels.
- Optimize alerting and monitoring pipelines for actionable insights.
- Establish and maintain on-call schedules for coverage across timezones.
- Develop Standard Operating Procedures (SOPs) for operations.
- Coordinate physical maintenance for Policlouds and manage virtualization layers.
- Plan resources for future initiatives and collaborate with development teams.
Requirements
- Expert-level experience operating Kubernetes in production environments.
- Strong network engineering skills including VLANs and L2/L3 routing.
- Proficiency with Linux systems administration (Debian/Ubuntu).
- Solid understanding of networking fundamentals and complex architectures.
- Experience building automation workflows with Ansible and Python.
- Familiarity with observability stacks like Prometheus and Grafana.
- Background with virtualization technologies such as OpenStack and VMware.
- Experience with bare-metal provisioning and MAAS.
- Strong understanding of distributed systems and container orchestration.
- Ability to develop SOPs and operational procedures from scratch.
- Experience with incident response and on-call rotations.
- Ability to work autonomously in a fast-paced environment.
Benefits
- 100% remote work with flexible hours.
- High-impact role with autonomy and ownership.
- Collaborative and international engineering team.
- Cutting-edge tech stack focused on reliability and automation.