GrepJob
ThoughtWorks

Senior Service Reliability Engineer

ThoughtWorks
Apply
about 16 hours ago
Singapore, SingaporeSenior
H1B Sponsor

Responsibilities

  • Improve site reliability by building fault-tolerant mechanisms and architectures.
  • Drive the integration of observability automation into the CI/CD pipeline.
  • Handle production incidents and manage communication with clients.
  • Monitor performance of production systems to meet SLA and SLO metrics.
  • Advise application development teams on system reliability improvements.
  • Enhance system observability to reduce false alarms and improve efficiency.
  • Implement chaos engineering practices for regular reliability testing.
  • Align site reliability direction with client goals and business needs.

Requirements

  • Hands-on experience in programming and scripting languages such as Python, Go, or Bash.
  • Good understanding of at least one Public Cloud (AWS, Azure, or GCP).
  • Exposure to observability tools like Grafana, Datadog, or ELK Stack.
  • Familiarity with DevOps and GitOps practices.
  • Knowledge of container-based architecture and orchestration tools like Kubernetes.
  • Understanding of technical architecture and modern design patterns.
  • Familiarity with Cloud’s Well Architected Framework principles.

Benefits

  • Career development supported by interactive tools and numerous programs.
  • A dynamic and inclusive community focused on continuous learning.

Tech Stack

AWSAzureBashDatadogDockerGoGoogle Cloud PlatformGrafanaKubernetesPython

Categories