UiPath

Principal Site Reliability Engineer

UiPath

Apply
about 1 month ago
Tokyo, Japan
Staff+
H1B Sponsor

Responsibilities

  • Lead Incident Command for high-stakes technical events.
  • Serve as a key escalation point for complex issues.
  • Own the communication life cycle during active incidents.
  • Lead thorough retrospectives and drive automated self-healing solutions.
  • Define and improve service health through SLIs and SLOs.
  • Design automation to reduce manual intervention during incidents.
  • Partner with development teams to promote service reliability.
  • Mentor and support other engineers in SRE best practices.

Requirements

  • 7+ years in SRE, Cloud Operations, or a related technical field.
  • At least 3 years in a lead responder or command-oriented role.
  • Demonstrated ability to remain calm and decisive under pressure.
  • Strong proficiency in Python or Go and understanding of distributed systems.
  • Deep experience with observability tools like Prometheus/Grafana.
  • Willingness to participate in on-call rotations as an Incident Commander.
  • Proficiency in English and Japanese for effective communication.

Tech Stack

AzureGoGrafanaKubernetesPrometheusPythonTerraform

Categories

AI & MLDevOps