GrepJob
PandaDoc

Senior Site Reliability Engineer

PandaDoc
Apply
3 months ago
Remote, SpainSenior
H1B Sponsor

Responsibilities

  • Own and influence the incident management process end-to-end.
  • Maintain and evolve the on-prem observability stack.
  • Participate in the on-call rotation to keep production applications running smoothly.
  • Develop automations and tools to support platform reliability.
  • Contribute to production services with a focus on performance and resiliency.
  • Collaborate with product engineers to foster SRE principles.
  • Mentor the SRE team and product engineers.

Requirements

  • Solid programming experience in Python (Django and AsyncIO) and/or Java (Spring Boot).
  • Experience in maintaining an observability tools suite, specifically LGTM (Loki, Grafana, Tempo, Mimir).
  • Experience in development and maintenance of Python services in production.
  • Strong experience with AWS and Kubernetes.
  • Proficiency in working with relational databases (PostgreSQL) and messaging systems (e.g., RabbitMQ, NATS, Kafka).
  • Experience as an on-call SRE engineer.
  • Hands-on troubleshooting skills in distributed systems.
  • Strong communication skills and a desire to share knowledge.
  • Proficiency in English, both written and spoken.

Benefits

  • Remote-first approach with the option for hybrid work from offices in Kyiv, Warsaw, and Lisbon.
  • Long-term collaboration valued through various employment arrangements.
  • Work schedule aligned with EU time zones.
  • Honest, open culture that values constructive feedback.
  • Opportunities for professional and personal development within a supportive team.
  • Stable yet growing SaaS product offering an agile environment and strong technical challenges.

Tech Stack

Apache KafkaAWSDjangoGrafanaJavaKubernetesPostgreSQLPythonRabbitMQSpring Boot

Categories