OKX

Infrastructure Stability Architect

OKX

Apply
4 months ago
Hong Kong, Hong Kong
Staff+

Responsibilities

  • Design and lead the stability architecture for large-scale distributed systems.
  • Develop and optimize comprehensive stability strategies.
  • Spearhead chaos engineering practices and design fault injection scenarios.
  • Build and refine monitoring and alerting systems for fault detection.
  • Lead root cause analysis for major incidents and formulate improvement plans.
  • Drive infrastructure intelligence and automation with AIOps solutions.
  • Collaborate with product, development, and operations teams.
  • Lead the development of stability-related technical standards and best practices.

Requirements

  • Bachelor degree or above in Computer Science or related major.
  • More than 10 years of architecture design experience in large-scale platforms.
  • Expert knowledge of distributed system architectures.
  • In-depth understanding of infrastructure components like Kubernetes and Kafka.
  • Strong systems thinking capability for analyzing complex stability issues.
  • Extensive experience in handling large-scale system failures.
  • Mastery of Linux systems and network technologies.
  • Excellent technical leadership skills.
  • Proficiency in English and Mandarin.

Benefits

  • Competitive total compensation package.
  • L&D programs and education subsidy for employee growth.
  • Various team building programs and company events.
  • Wellness and meal allowances.
  • Comprehensive healthcare schemes for employees and dependants.
  • More benefits to be shared during the process.

Tech Stack

Alibaba CloudApache KafkaKubernetesLinux

Categories

Data EngineeringDevOps