about 4 hours ago
Base Salary
$200k - $400k/yr
Responsibilities
- Own the reliability and scalability of production systems handling social data and AI workloads.
- Define and drive SLOs, SLIs, and error budgets, and build observability and alerting practices.
- Lead incident response and blameless postmortems to implement systemic improvements.
- Improve performance, cost efficiency, and capacity planning across cloud infrastructure.
- Harden infrastructure-as-code, deployment, and CI/CD pipelines for resilience.
- Partner with engineering teams to embed reliability into system design.
Requirements
- 5+ years of experience operating production systems as an SRE, infrastructure, or platform engineer.
- Experience scaling databases, data infrastructure, or complex production platforms under load.
- Hands-on expertise with cloud infrastructure (AWS or similar) and infrastructure-as-code tooling.
- Solid programming skills for building automation, tooling, and operational services.
- Comfortable operating in fast-moving startup environments with high ownership and autonomy.
- A reliability-first mindset balanced with pragmatism about velocity and cost.
Benefits
- Competitive compensation and early equity.
- Health, vision, and dental benefits + 401(k) match.
- Clear career growth opportunities as the company scales.
- Free lunch in the heart of University Ave. in Palo Alto.
- Deep exposure to cutting-edge AI tooling and the opportunity to shape its use.
- A collaborative, ambitious team defining a new category of AI-native marketing infrastructure.
