ThoughtWorks

Lead Service Reliability Engineer

ThoughtWorks

Apply
3 months ago
Singapore, Singapore
Mid Level / Senior / Staff+
H1B Sponsor

Responsibilities

  • Understand SRE goals from both technical and business perspectives.
  • Provide solutions to improve reliability and fault tolerance.
  • Enhance the incident management process and develop prioritization matrices.
  • Manage client stakeholder expectations during production incidents.
  • Build trust and relationships with senior client stakeholders.
  • Identify opportunities for enhancing system performance and reliability.
  • Collaborate with application development leads to recommend system design changes.
  • Oversee and mentor other SREs on the team.

Requirements

  • Proficient in one or more high-level programming languages such as Python, Golang, or Java.
  • Familiar with DevOps and GitOps practices.
  • In-depth knowledge of configuration management and Infrastructure as Code tools.
  • Expertise in observability and monitoring tools.
  • Strong understanding of container-based architecture and orchestration tools.
  • Experience in application and infrastructure performance tuning.
  • Good understanding of quality gates and chaos engineering concepts.
  • Experience with network load balancing and security tech stacks.
  • Strong communication and articulation skills in English.
  • Excellent problem-solving and analytical skills.

Benefits

  • Career development supported by interactive tools and numerous programs.
  • A dynamic and inclusive community focused on continuous learning.

Tech Stack

AnsibleCircleCIGoGrafanaGraylogJavaJenkinsKubernetesPrometheusPythonRubyTerraform

Categories

DevOpsSecurity