Founding Platform & Reliability Engineer

3 months ago

San Francisco, CA, USASenior / Staff+

Responsibilities

Define and operationalize SLOs/SLIs across critical user journeys.
Participate in on-call rotation and lead incident response improvements.
Implement reliability patterns and build health measurement mechanisms.
Establish end-to-end observability with structured logs and metrics.
Build deploy safety practices including automated rollbacks and CI/CD gates.
Own the direction of infrastructure architecture and guide transitions.
Build cost observability and control primitives.

Requirements

5+ years of experience building and operating production systems.
Strong software engineering skills with the ability to ship production code.
Cloud-native experience with AWS or GCP, including serverless systems.
Deep knowledge of observability practices and incident response.
Ability to design resilient interactions with external dependencies.
Strong communication skills to convey tradeoffs to non-infra peers.
Ability to operate with ambiguity and define problems effectively.

Benefits

Competitive base salary and bonus program.
Equity offering meaningful ownership in the company.
High autonomy and growth environment.
Hybrid work setup with Bay Area preference.
Visa sponsorship available.

Tech Stack

FirebaseGoogle Cloud PlatformNext.jsNode.js Python ReactRedisTypeScript

Categories

AI & ML Backend DevOps