ThoughtWorks

Senior Service Reliability Engineer

ThoughtWorks

Apply
3 months ago
Singapore, Singapore
Senior
H1B Sponsor

Responsibilities

  • Conduct SRE and Disaster Recovery maturity assessments.
  • Engineer automation solutions using Ansible to replace manual workflows.
  • Own and manage the current manual Disaster Recovery process/pipeline.
  • Improve site reliability through fault tolerance mechanisms.
  • Drive the integration of observability automation into the CI/CD pipeline.
  • Handle production incidents and lead client communication.
  • Monitor performance of production systems to meet SLA and SLO targets.
  • Advise application development teams on reliability improvements.
  • Enhance system observability and implement chaos engineering practices.
  • Align site reliability with client goals and high availability targets.

Requirements

  • Expertise in Ansible orchestration and advanced strategies.
  • Ability to integrate Terraform with Ansible for seamless workflows.
  • Hands-on experience with Python, Go, Bash, or PowerShell scripting.
  • Working knowledge of at least one public cloud (AWS/Azure/GCP).
  • Experience with observability tools and data analysis for RCA.
  • Familiarity with DevOps, SRE, and GitOps concepts.
  • Knowledge of container technologies and orchestration.
  • Understanding of modern architecture and experience with metrics/dashboards.
  • Experience designing infrastructure aligned with Cloud Well-Architected principles.

Tech Stack

AnsibleAWSAzureBashDatadogGoGoogle Cloud PlatformGrafanaKubernetesPowerShellPythonTerraform

Categories

BackendDevOps