Staff, Back-end Engineer (SRE)
Coupang
4 months ago
Seoul, Korea, South
Staff+
H1B Sponsor
Responsibilities
- Serve as the primary point responsible for platform reliability and performance of customer-facing services.
- Gain deep knowledge of Coupang application workflow and dependencies.
- Define and track key performance indicators (KPIs) and service-level objectives (SLOs).
- Build incident management processes and automation for fast incident remediation.
- Develop best practices for monitoring, alerting, and telemetry systems.
- Automate disaster recovery testing, chaos testing, and load testing.
- Collaborate with product development teams to ensure scalable and operable designs.
- Establish guardrails and automation for deploying production changes.
- Participate in a 24x7 rotation for production issue escalations.
- Communicate effectively across all levels of the organization.
Requirements
- 10+ years of experience in building and operating large scale distributed systems.
- Experience with SLO/SLA management and implementation.
- Deep knowledge of UNIX/Linux systems and administration.
- Demonstrated programming skills in Python, Java, Golang, or Ruby.
- Strong problem-solving and analytical skills across systems, network, and code.
- Experience with cloud-based GPU infrastructure, including AWS, Azure, or Google Cloud.
- Understanding of DevOps and SRE practices, including CI/CD and infrastructure as code.
- Experience with containerization and orchestration technologies like Docker and Kubernetes.
- Excellent communication and collaboration skills.
- Knowledge of open telemetry observability tools such as Prometheus and Grafana.
Tech Stack
AWSAzureDatadogDockerGoGoogle Cloud PlatformGrafanaJavaKubernetesLinuxPrometheusPythonRuby
Categories
BackendDevOps