GrepJob
Okta

Principal Site Reliability Engineer

Okta
Apply
about 6 hours ago
Bengaluru, IndiaStaff+
H1B Sponsor

Responsibilities

  • Define and drive the reliability strategy for critical product and platform services.
  • Establish standards for availability, resilience, observability, incident management, and operational readiness.
  • Lead architecture reviews for critical services and platform initiatives.
  • Partner with engineering leaders to align reliability objectives with business priorities.
  • Create frameworks and operational guardrails for engineering teams.
  • Guide service architecture toward simplicity, scalability, and operational excellence.
  • Drive initiatives that improve platform maturity and sustainability.
  • Own reliability architecture for the Spera / ISPM product area.
  • Collaborate with engineering leadership to establish reliability objectives.
  • Lead large-scale scalability and performance initiatives.
  • Design, build, and operate large-scale cloud infrastructure and production services.
  • Develop software and automation using Go, Python, and Terraform.
  • Eliminate operational toil through automation and tooling.
  • Mentor Staff and Senior engineers across multiple teams.
  • Lead technical reviews and operational readiness assessments.
  • Drive adoption of reliability engineering best practices across EPG.

Requirements

  • Extensive experience designing and operating large-scale production systems in AWS and/or GCP.
  • Deep expertise with Kubernetes in production environments.
  • Experience designing reliability strategies for Kubernetes-based platforms.
  • Strong software engineering skills in Golang and/or Python.
  • Experience with Infrastructure as Code technologies such as Terraform and Helm.
  • Strong understanding of distributed systems architecture and cloud-native application design.
  • Experience operating and troubleshooting distributed data platforms.
  • Strong understanding of cloud security fundamentals and secure infrastructure design.
  • Demonstrated success leading complex technical initiatives across multiple teams.
  • Ability to translate business objectives into technical strategy.

Benefits

  • Immersive, in-person onboarding experience designed to accelerate impact.
  • Support for well-being and social impact initiatives.
  • Opportunities for talent development and community connection.