Software Engineer, Site Reliability (SRE)

8 months ago

San Francisco, CA, USA Senior

Base Salary

$230k - $390k/yr

Responsibilities

Own Sierra’s observability stack for monitoring, alerting, logging, and tracing.
Collaborate with product and platform engineers to design reliable and scalable systems.
Design and implement scalable, reliable, and secure cloud infrastructure using Terraform and AWS.
Enhance the reliability and scalability of LLM deployments.
Lead improvements to deployment pipelines, CI/CD tooling, and incident management processes.
Define and influence SRE practices and culture across the engineering organization.

Requirements

5+ years of experience in Site Reliability or Infrastructure engineering roles.
Experience designing for availability, scalability, and reliability.
Deep knowledge of Terraform, AWS services, and cloud networking.
Strong background in observability systems like Prometheus or Grafana.
Experience with enterprise customers and their compliance needs.
Degree in Computer Science or equivalent professional experience.

Benefits

Flexible (unlimited) paid time off.
Medical, dental, and vision benefits for you and your family.
Life insurance and disability benefits.
Retirement plan dependent on country of employment.
Parental leave and fertility benefits.
Lunch and snacks provided.
Discretionary benefit stipend.
Free alphorn lessons.

Tech Stack

AWSDatadogGrafanaPrometheusTerraform

Categories