Coupang

Sr. Staff Site Reliability Engineer

Coupang

Apply
4 months ago
Seattle, WA, USA
Senior / Staff+
H1B Sponsor

Base Salary

$176k - $221k/yr

Responsibilities

  • Serve as the primary point responsible for platform reliability and performance of customer-facing services.
  • Gain deep knowledge of Coupang application workflows and dependencies.
  • Define and track key performance indicators (KPIs) and service-level objectives (SLOs).
  • Build incident management processes and automation for fast incident remediation.
  • Develop best practices for monitoring, alerting, and telemetry systems.
  • Automate disaster recovery testing, chaos testing, and load testing.
  • Collaborate with product development teams to ensure scalable and operable designs.
  • Establish guardrails and automation for deploying production changes.
  • Participate in a 24x7 rotation for production issue escalations.
  • Communicate effectively across all levels of the organization.

Requirements

  • Bachelor's degree in computer science, engineering, or a related technical field.
  • 8+ years of experience building and operating large-scale distributed systems.
  • Experience with AI/ML and large-scale web-based Java architectures is preferred.
  • Professional certifications in cloud platforms or monitoring tools are a plus.
  • Deep knowledge of UNIX/Linux systems and administration.
  • Programming skills in Python, Java, Golang, or Ruby.
  • Strong problem-solving and analytical skills across systems and networks.
  • Experience with cloud-based GPU infrastructure, including AWS, Azure, or Google Cloud.
  • Understanding of DevOps and SRE practices, including CI/CD and infrastructure as code.
  • Familiarity with containerization and orchestration technologies like Docker and Kubernetes.
  • Excellent communication and collaboration skills.

Benefits

  • Medical, dental, vision, and life insurance.
  • Flexible Spending Accounts (FSA) and Health Savings Account (HSA).
  • Long-term and short-term disability coverage.
  • Employee Assistance Program (EAP).
  • 401K Plan with company match.
  • 18-21 days of Paid Time Off (PTO) based on tenure.
  • 12 public holidays and paid parental leave.
  • Pre-tax commuter benefits.
  • Free electric car charging station.

Tech Stack

AWSAzureDatadogDockerGoGoogle Cloud PlatformGrafanaJavaKubernetesLinuxPrometheusPythonRuby

Categories

AI & MLData EngineeringDevOps