7 months ago
Base Salary
$230k - $390k/yr
Responsibilities
- Own Sierra’s observability stack for monitoring, alerting, logging, and tracing.
- Collaborate with product and platform engineers to design reliable and scalable systems.
- Design and implement scalable, reliable, and secure cloud infrastructure using Terraform and AWS.
- Enhance the reliability and scalability of LLM deployments.
- Lead improvements to deployment pipelines, CI/CD tooling, and incident management processes.
- Define and influence SRE practices and culture across the engineering organization.
Requirements
- 5+ years of experience in Site Reliability or Infrastructure engineering roles.
- Experience designing for availability, scalability, and reliability.
- Deep knowledge of Terraform, AWS services, and cloud networking.
- Strong background in observability systems like Prometheus or Grafana.
- Experience with enterprise customers and their compliance needs.
- Degree in Computer Science or equivalent professional experience.
Benefits
- Flexible (unlimited) paid time off.
- Medical, dental, and vision benefits for you and your family.
- Life insurance and disability benefits.
- Retirement plan dependent on country of employment.
- Parental leave and fertility benefits.
- Lunch and snacks provided.
- Discretionary benefit stipend.
- Free alphorn lessons.
