Agoda

Lead Devops Engineer (Bangkok based, relocation provided)

Agoda

Apply
about 1 month ago
Bangkok, Thailand
Mid Level / Senior / Staff+

Responsibilities

  • Lead the technical vision and execution of new SRE platforms.
  • Define and promote SRE best practices across Agoda’s services.
  • Design, build, and operate reliability platforms.
  • Own safe deployment strategies integrated with monitoring.
  • Identify and mitigate reliability and scaling risks.
  • Improve system resilience by partnering with platform and operation teams.
  • Lead major incident response and operational excellence.
  • Maintain and evolve incident and observability tooling.
  • Advance platform observability and reliability signals.
  • Define reliability roadmaps and OKRs.

Requirements

  • Demonstrated ownership of architecting and operating mission-critical production systems.
  • Proven ability to lead complex cross-team initiatives.
  • Expertise in programming languages such as Go, Python, Rust, or Java.
  • Deep hands-on experience with the Kubernetes ecosystem.
  • Observability and monitoring expertise using Prometheus and Grafana.
  • Strong incident management lifecycle experience.
  • Experience with reliability engineering patterns.
  • Solid data analysis skills, including SQL.
  • Data-driven mindset for analyzing complex problems.
  • Excellent communication and collaboration skills.

Tech Stack

Argo CDGoGrafanaIstioJavaKubernetesMicrosoft SQL ServerPostgreSQLPrometheusPythonRustSQL

Categories

DevOps