Senior Site Reliability Engineer - PSRE
Arcesium LLC1 day ago
Lisbon, PortugalSenior / Mid Level
H1B Sponsor
Responsibilities
- Ensure observability, monitoring, logging, and tracing to proactively detect and prevent issues.
- Build tools and infrastructure that enhance system stability and resilience.
- Troubleshoot live production issues with a focus on rapid incident resolution.
- Manage and recover from platform-wide incidents to minimize downtime.
- Continuously monitor application health and performance, analyzing trends to prevent incidents.
- Collaborate with engineering teams during incident response and reliability initiatives.
- Identify opportunities for automation and improve operational efficiency.
- Contribute to the development and improvement of SRE practices and tools.
Requirements
- Up to 5 years of experience in Site Reliability Engineering, DevOps, or Production Engineering.
- Expertise in incident management, including triaging and resolution of high-severity outages.
- Proficiency in at least one coding language, preferably Python or Java.
- Hands-on experience with Kubernetes for managing containerized applications.
- Cloud experience, preferably with AWS, including services like EC2, S3, and Lambda.
- Excellent communication skills to articulate technical challenges and solutions.
- Strong troubleshooting and problem-solving skills for diagnosing complex production issues.
- Ability to stay calm under pressure and prioritize effectively in fast-moving environments.
- Fluency in English (spoken and written) is required.
- Must have the legal right to work in the country.
Benefits
- Flexible work arrangements (hybrid model) and a casual dress code.
- Opportunity to work on challenging projects in a dynamic, global environment.
- Continuous learning and development opportunities.
- Collaborative and innovative work culture.
- Competitive compensation and benefits package.
- Modern and comfortable office located at Avenida da Liberdade (Lisbon).