GrepJob
Harvey

Staff Software Engineer - Site Reliability Engineering (SRE)

Harvey
Apply
about 1 month ago
Bengaluru, IndiaStaff+

Responsibilities

  • Design, implement, and manage monitoring and infrastructure resources across 50+ global regions.
  • Lead incident management processes, including postmortems and root cause analyses.
  • Automate operational tasks and workflows to maintain high reliability.
  • Establish best practices for security, compliance, and reliability.
  • Optimize infrastructure costs through strategic capacity planning.
  • Provide technical mentorship and leadership to promote best practices.

Requirements

  • 10+ years of experience in Site Reliability Engineering or similar roles.
  • Expertise in infrastructure as code tools like Pulumi, Terraform, or CloudFormation.
  • Familiarity with observability tools and incident response practices.
  • Proficiency with cloud infrastructure platforms such as Azure, GCP, or AWS.
  • Strong programming skills in Python, Bash, Go, or similar languages.
  • Solid understanding of CI/CD, Kubernetes, containerization, and cloud security principles.

Tech Stack

AWSAzureBashDatadogGoGoogle Cloud PlatformKubernetesPythonTerraform

Categories