GrepJob
Agoda

Lead Software Engineer, DevOps Platform (Bangkok based, relocation provided)

Agoda
Apply
about 4 hours ago
Bangkok, ThailandSenior / Staff+

Responsibilities

  • Lead the technical vision and execution of new SRE platforms.
  • Define and promote SRE best practices across Agoda’s services.
  • Design, build, and operate reliability platforms.
  • Own safe deployment strategies such as canary releases and automated rollback.
  • Proactively identify and mitigate reliability and scaling risks.
  • Improve system resilience by partnering with platform and operation teams.
  • Lead major incident response and operational excellence.
  • Maintain and evolve incident, observability, alerting, and on-call tooling.
  • Advance platform observability and reliability signals using Prometheus and Grafana.
  • Define reliability roadmaps and OKRs.

Requirements

  • 8+ years of relevant experience.
  • Demonstrated ownership of architecting and operating mission-critical production systems.
  • Proven ability to lead complex cross-team initiatives.
  • Expertise in programming languages such as Go, Python, Rust, or Java.
  • Deep hands-on experience with the Kubernetes ecosystem.
  • Observability and monitoring expertise using Prometheus and Grafana.
  • Strong incident management lifecycle experience.
  • Experience with reliability engineering patterns.
  • Solid data analysis skills, including SQL.
  • Excellent communication and collaboration skills.
  • Curiosity and continuous learning mindset.

Tech Stack

Argo CDGoGrafanaIstioJavaKubernetesMicrosoft SQL ServerPostgreSQLPrometheusPythonRustSQL

Categories