Senior Service Reliability Engineer
ThoughtWorks
3 months ago
Singapore, Singapore
Senior
H1B Sponsor
Responsibilities
- Conduct SRE and Disaster Recovery maturity assessments.
- Engineer automation solutions using Ansible to replace manual workflows.
- Own and manage the current manual Disaster Recovery process/pipeline.
- Improve site reliability through fault tolerance mechanisms.
- Drive the integration of observability automation into the CI/CD pipeline.
- Handle production incidents and lead client communication.
- Monitor performance of production systems to meet SLA and SLO targets.
- Advise application development teams on reliability improvements.
- Enhance system observability and implement chaos engineering practices.
- Align site reliability with client goals and high availability targets.
Requirements
- Expertise in Ansible orchestration and advanced strategies.
- Ability to integrate Terraform with Ansible for seamless workflows.
- Hands-on experience with Python, Go, Bash, or PowerShell scripting.
- Working knowledge of at least one public cloud (AWS/Azure/GCP).
- Experience with observability tools and data analysis for RCA.
- Familiarity with DevOps, SRE, and GitOps concepts.
- Knowledge of container technologies and orchestration.
- Understanding of modern architecture and experience with metrics/dashboards.
- Experience designing infrastructure aligned with Cloud Well-Architected principles.
Tech Stack
AnsibleAWSAzureBashDatadogGoGoogle Cloud PlatformGrafanaKubernetesPowerShellPythonTerraform
Categories
BackendDevOps