GrepJob
Okta

Staff Site Reliability Engineer

Okta
Apply
about 24 hours ago
Bengaluru, IndiaStaff+
H1B Sponsor

Responsibilities

  • Design, build, and operate scalable, reliable, and secure infrastructure across AWS and GCP.
  • Lead reliability and modernization initiatives, including container platform migrations.
  • Serve as a technical authority in Kubernetes and cloud infrastructure.
  • Partner with development teams to architect microservice-based applications.
  • Implement and manage infrastructure as code using Terraform and Ansible.
  • Drive improvements in observability, performance, and cost efficiency.
  • Champion SRE best practices and conduct blameless postmortems.
  • Lead complex technical projects from conception to completion.
  • Mentor engineers and foster a culture of reliability and automation.
  • Collaborate with security and compliance partners to ensure best practices.
  • Participate in the on-call rotation to enhance systems and processes.

Requirements

  • 8+ years in SRE, DevOps, or Infrastructure Engineering roles.
  • 3–5 years of experience with Kubernetes (EKS/GKE) in production.
  • 3–5 years of experience with AWS and GCP.
  • 3–5 years using Terraform for multi-cloud infrastructure management.
  • 5+ years of coding experience in Python, Go, or similar languages.
  • Proven experience leading ECS to EKS/GKE migrations.
  • Experience implementing SLOs/SLIs and improving operational resilience.
  • Strong Linux and security fundamentals.
  • Bachelor’s degree in Computer Science or equivalent experience.

Tech Stack

AnsibleAWSGitLab CI/CDGoGoogle Cloud PlatformGrafanaKubernetesLinuxMySQLPostgreSQLPrometheusPythonRedisSpinnakerTerraform

Categories