Staff Software Engineer, Site Reliability Engineer (SRE)

7 months ago

San Francisco, CA, USA Staff+

H1B Sponsor

Base Salary

$238k - $290k/yr

Responsibilities

Design, implement, and manage monitoring and infrastructure resources across 50+ global regions.
Lead incident management processes, including postmortems and root cause analyses.
Automate operational tasks and workflows to maintain high reliability.
Establish best practices for security, compliance, and reliability.
Optimize infrastructure costs through strategic capacity planning.
Provide technical mentorship and leadership to promote best practices.

Requirements

10+ years of experience in Site Reliability Engineering or similar roles.
Expertise in infrastructure as code tools like Pulumi, Terraform, or CloudFormation.
Deep familiarity with observability tools and incident response practices.
Proficiency with cloud infrastructure platforms such as Azure, GCP, or AWS.
Strong programming skills in Python, Bash, Go, or similar languages.
Proven track record of diagnosing complex system problems.

Benefits

In-person work model with relocation assistance for new employees.

Tech Stack

AWS AzureBashDatadogGo Google Cloud Platform Kubernetes Python Terraform

Categories

DevOps Security