about 5 hours ago
Responsibilities
- Partner with engineering, platform, and business-aligned teams to improve the reliability and performance of critical financial services applications.
- Define, measure, and manage SLIs, SLOs, and error budgets to drive data-driven reliability improvements.
- Lead and support incident management activities, participating in a 24x7 on-call rotation and driving effective post-incident reviews.
- Build automation, self-healing capabilities, and operational tooling that reduce manual intervention and improve service recovery times.
- Analyse application, infrastructure, and platform performance to identify reliability risks and deliver continuous improvements across the technology estate.
- Collaborate with an India-based team of engineers.
Requirements
- Proven experience in Site Reliability Engineering, Production Engineering, DevOps, Platform Engineering, or a similar operationally focused role.
- Strong knowledge of observability, monitoring, incident management, reliability engineering, and service operations best practices.
- Experience supporting business-critical applications within complex enterprise environments.
- Hands-on experience with automation, scripting, infrastructure management, and cloud or hybrid technology platforms.
- Excellent communication skills with the ability to collaborate effectively with engineering teams, operational stakeholders, and business partners.
Benefits
- Flexible collaboration model based on a B2B contract.
- Opportunity to work on diverse projects.
