Senior Site Reliability Engineer

about 2 months ago

Palo Alto, CA, USASenior

H1B Sponsor

Base Salary

$200k - $400k/yr

Responsibilities

Own the reliability and scalability of production systems handling social data and AI workloads.
Define and drive SLOs, SLIs, and error budgets, and build observability and alerting practices.
Lead incident response and blameless postmortems to implement systemic improvements.
Improve performance, cost efficiency, and capacity planning across cloud infrastructure.
Harden infrastructure-as-code, deployment, and CI/CD pipelines for resilience.
Partner with engineering teams to embed reliability into system design.

5+ years of experience operating production systems as an SRE, infrastructure, or platform engineer.
Experience scaling databases, data infrastructure, or complex production platforms under load.
Hands-on expertise with cloud infrastructure (AWS or similar) and infrastructure-as-code tooling.
Solid programming skills for building automation, tooling, and operational services.
Comfortable operating in fast-moving startup environments with high ownership and autonomy.
A reliability-first mindset balanced with pragmatism about velocity and cost.

Competitive compensation and early equity.
Health, vision, and dental benefits + 401(k) match.
Clear career growth opportunities as the company scales.
Free lunch in the heart of University Ave. in Palo Alto.
Deep exposure to cutting-edge AI tooling and the opportunity to shape its use.
A collaborative, ambitious team defining a new category of AI-native marketing infrastructure.