Senior Site Reliability Engineer

5 months ago

Remote, PortugalSenior

H1B Sponsor

Responsibilities

Own and influence the incident management process end-to-end.
Maintain and evolve the on-prem observability stack.
Participate in the on-call rotation to keep production applications running smoothly.
Develop automations and tools to support platform reliability.
Contribute to production services with performance and resiliency in mind.
Collaborate with product engineers to foster SRE principles within the R&D organization.
Mentor the SRE team or product engineers.

Solid programming experience in Python (Django and AsyncIO) and/or Java (Spring Boot).
Experience in maintaining an observability tools suite, specifically LGTM (Loki, Grafana, Tempo, Mimir).
Experience in development and maintenance of Python services in production.
Strong experience with AWS and Kubernetes.
Proficiency in working with relational databases (PostgreSQL) and messaging systems (e.g., RabbitMQ, NATS, Kafka).
Experience as an on-call SRE engineer.
Enjoy hands-on troubleshooting of distributed systems in production environments.
Strong communication skills and a desire to share knowledge on reliability.
Proficiency in English, both written and spoken.

Remote-first approach with the option for hybrid work from offices in Kyiv, Warsaw, and Lisbon.
Long-term collaboration valued through various employment arrangements.
Work schedule aligned with EU time zones.
Honest, open culture that values constructive feedback.
Opportunities for professional and personal development within a collaborative team.
Stable yet growing SaaS product offering an agile environment and strong technical challenges.