GrepJob
Pragmatike

Senior Site Reliability Engineer / Kubernetes (Remote)

Pragmatike
Apply
about 13 hours ago
Rome, Italy +3 moreSenior

Responsibilities

  • Operate and maintain Linux-based infrastructure (Debian/Ubuntu).
  • Deploy, manage, and scale Kubernetes clusters across various environments.
  • Oversee full cluster lifecycle including upgrades and security hardening.
  • Implement automation for provisioning and operations using Ansible and GitOps.
  • Design and maintain networking architecture including VLANs and VPNs.
  • Build automated deployment workflows.
  • Deploy and maintain observability stacks like Prometheus and Grafana.
  • Lead incident response and escalation activities.
  • Improve system availability and reduce latency.
  • Define and implement SLOs/SLIs at multiple infrastructure levels.
  • Optimize alerting and monitoring pipelines.
  • Establish and maintain on-call schedules.
  • Develop Standard Operating Procedures (SOPs) for operations.
  • Coordinate physical maintenance for Policlouds.
  • Manage virtualization and orchestration layers.
  • Help develop and maintain overall architecture across products.
  • Plan resources for future initiatives.
  • Work with development teams to improve quality and resource utilization.
  • Collaborate with cross-functional stakeholders.

Requirements

  • Expert-level experience operating Kubernetes in production environments.
  • Strong network engineering skills including VLANs and L2/L3 routing.
  • Proficiency with Linux systems administration (Debian/Ubuntu).
  • Solid understanding of networking fundamentals.
  • Experience building and maintaining automation workflows.
  • Experience with observability stacks such as Prometheus and ELK.
  • Background with virtualization technologies like OpenStack and VMware.
  • Experience with bare-metal provisioning and MAAS.
  • Strong understanding of distributed systems and container orchestration.
  • Process-oriented mindset with ability to develop SOPs.
  • Experience with incident response and on-call rotations.
  • Ability to work autonomously in a fast-paced environment.
  • Strong technical skills aligned with team values.

Benefits

  • 100% remote work with flexible hours.
  • High-impact role with autonomy and ownership.
  • Collaborative and international engineering team.
  • Cutting-edge tech stack focused on reliability and automation.

Tech Stack

AnsibleBashCloudflareGrafanaGraylogIstioKubernetesLinuxOpenStackPrometheusPython

Categories