GrepJob
Coupang

Senior Staff Cloud Backend Engineer - Observability and Site Reliability

Coupang
Apply
1 day ago
Bengaluru, IndiaSenior / Staff+
H1B Sponsor

Responsibilities

  • Design, implement, and maintain observability solutions for datacenter infrastructure.
  • Develop, deploy, and operate large-scale observability and telemetry platforms.
  • Own and contribute to the full lifecycle of observability services.
  • Build and enhance monitoring systems for high availability and performance.
  • Create and manage dashboards, alerts, and reports for system health visibility.
  • Apply SRE principles to improve reliability and operational efficiency.
  • Develop and maintain automation for infrastructure provisioning and management.
  • Lead root cause analysis and post-incident reviews.
  • Analyze system performance to identify bottlenecks and improvement areas.
  • Partner with cross-functional teams to deliver effective observability solutions.
  • Ensure solutions adhere to security policies and industry standards.
  • Provide hands-on support for observability and reliability issues.
  • Continuously enhance the scalability and operational efficiency of services.

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
  • 12+ years of progressive software engineering experience.
  • Proven experience in managing and optimizing large-scale datacenter environments.
  • Strong proficiency in Go or Python with a deep understanding of networked systems.
  • Expert-level knowledge of Kubernetes internals and containerization ecosystems.
  • Proven experience with load balancing and service mesh at scale.
  • Proficiency in observability tools like Prometheus and Grafana.
  • Experience with SRE practices and tools such as Kubernetes and Terraform.
  • Familiarity with cloud platforms like AWS, Azure, or GCP.

Benefits

  • Hybrid work model allowing flexibility to work from home 2 days a week.
  • Collaborative culture that enriches employee experience.

Tech Stack

AWSAzureDockerGoGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonTerraform