Site Reliability Engineer

8 months ago

H1B Sponsor

Base Salary

$230k - $310k/yr

Responsibilities

Own the reliability, availability, and performance of Gamma's production systems across AWS.
Build observability infrastructure including metrics, logging, tracing, and alerting.
Design and implement automation to reduce operational toil and improve deployment safety.
Lead incident response and conduct blameless post-mortems to drive systemic improvements.
Collaborate with engineering teams on architecture reviews and reliability best practices.
Manage and optimize compute, networking, databases, and managed services.

5+ years in site reliability engineering, DevOps, or systems engineering with AWS expertise.
Strong programming skills in Python, Go, or TypeScript/Node.js.
Experience with infrastructure-as-code tools like Terraform or CloudFormation.
Proven track record of improving system reliability through automation and monitoring.
Deep understanding of networking, distributed systems, and containerization technologies.
Sharp incident management instincts and debugging skills for complex production failures.
Experience scaling SaaS products or familiarity with Kafka and chaos engineering is a plus.
AWS certifications or experience with security frameworks like SOC 2 is a plus.