Lead Service Reliability Engineer
ThoughtWorks
3 months ago
Singapore, Singapore
Mid Level / Senior / Staff+
H1B Sponsor
Responsibilities
- Understand SRE goals from both technical and business perspectives.
- Provide solutions to improve reliability and fault tolerance.
- Enhance the incident management process and develop prioritization matrices.
- Manage client stakeholder expectations during production incidents.
- Build trust and relationships with senior client stakeholders.
- Identify opportunities for enhancing system performance and reliability.
- Collaborate with application development leads to recommend system design changes.
- Oversee and mentor other SREs on the team.
Requirements
- Proficient in one or more high-level programming languages such as Python, Golang, or Java.
- Familiar with DevOps and GitOps practices.
- In-depth knowledge of configuration management and Infrastructure as Code tools.
- Expertise in observability and monitoring tools.
- Strong understanding of container-based architecture and orchestration tools.
- Experience in application and infrastructure performance tuning.
- Good understanding of quality gates and chaos engineering concepts.
- Experience with network load balancing and security tech stacks.
- Strong communication and articulation skills in English.
- Excellent problem-solving and analytical skills.
Benefits
- Career development supported by interactive tools and numerous programs.
- A dynamic and inclusive community focused on continuous learning.
Tech Stack
AnsibleCircleCIGoGrafanaGraylogJavaJenkinsKubernetesPrometheusPythonRubyTerraform
Categories
DevOpsSecurity