Infrastructure Stability Architect
OKX
4 months ago
Hong Kong, Hong Kong
Staff+
Responsibilities
- Design and lead the stability architecture for large-scale distributed systems.
- Develop and optimize comprehensive stability strategies.
- Spearhead chaos engineering practices and design fault injection scenarios.
- Build and refine monitoring and alerting systems for fault detection.
- Lead root cause analysis for major incidents and formulate improvement plans.
- Drive infrastructure intelligence and automation with AIOps solutions.
- Collaborate with product, development, and operations teams.
- Lead the development of stability-related technical standards and best practices.
Requirements
- Bachelor degree or above in Computer Science or related major.
- More than 10 years of architecture design experience in large-scale platforms.
- Expert knowledge of distributed system architectures.
- In-depth understanding of infrastructure components like Kubernetes and Kafka.
- Strong systems thinking capability for analyzing complex stability issues.
- Extensive experience in handling large-scale system failures.
- Mastery of Linux systems and network technologies.
- Excellent technical leadership skills.
- Proficiency in English and Mandarin.
Benefits
- Competitive total compensation package.
- L&D programs and education subsidy for employee growth.
- Various team building programs and company events.
- Wellness and meal allowances.
- Comprehensive healthcare schemes for employees and dependants.
- More benefits to be shared during the process.
Tech Stack
Alibaba CloudApache KafkaKubernetesLinux
Categories
Data EngineeringDevOps