GrepJob
PandaDoc

Senior Site Reliability Engineer

PandaDoc
Apply
4 months ago
Remote, PortugalSenior
H1B Sponsor

Responsibilities

  • Own and influence the incident management process end-to-end.
  • Maintain and evolve the on-prem observability stack.
  • Participate in the on-call rotation to keep production applications running smoothly.
  • Develop automations and tools to support platform reliability.
  • Contribute to production services with performance and resiliency in mind.
  • Collaborate with product engineers to foster SRE principles within the R&D organization.
  • Mentor the SRE team or product engineers.

Requirements

  • Solid programming experience in Python (Django and AsyncIO) and/or Java (Spring Boot).
  • Experience in maintaining an observability tools suite, specifically LGTM (Loki, Grafana, Tempo, Mimir).
  • Experience in development and maintenance of Python services in production.
  • Strong experience with AWS and Kubernetes.
  • Proficiency in working with relational databases (PostgreSQL) and messaging systems (e.g., RabbitMQ, NATS, Kafka).
  • Experience as an on-call SRE engineer.
  • Enjoy hands-on troubleshooting of distributed systems in production environments.
  • Strong communication skills and a desire to share knowledge on reliability.
  • Proficiency in English, both written and spoken.

Benefits

  • Remote-first approach with the option for hybrid work from offices in Kyiv, Warsaw, and Lisbon.
  • Long-term collaboration valued through various employment arrangements.
  • Work schedule aligned with EU time zones.
  • Honest, open culture that values constructive feedback.
  • Opportunities for professional and personal development within a collaborative team.
  • Stable yet growing SaaS product offering an agile environment and strong technical challenges.

Tech Stack

Apache KafkaAWSDjangoGrafanaJavaKubernetesPostgreSQLPythonRabbitMQSpring Boot

Categories