
Founding Platform & Reliability Engineer
Embedding VCabout 1 month ago
San Francisco, CA, USASenior / Staff+
Responsibilities
- Define and operationalize SLOs/SLIs across critical user journeys.
- Participate in on-call rotation and lead incident response improvements.
- Implement reliability patterns and build health measurement mechanisms.
- Establish end-to-end observability with structured logs and metrics.
- Build deploy safety practices including automated rollbacks and CI/CD gates.
- Own the direction of infrastructure architecture and guide transitions.
- Build cost observability and control primitives.
Requirements
- 5+ years of experience building and operating production systems.
- Strong software engineering skills with the ability to ship production code.
- Cloud-native experience with AWS or GCP, including serverless systems.
- Deep knowledge of observability practices and incident response.
- Ability to design resilient interactions with external dependencies.
- Strong communication skills to convey tradeoffs to non-infra peers.
- Ability to operate with ambiguity and define problems effectively.
Benefits
- Competitive base salary and bonus program.
- Equity offering meaningful ownership in the company.
- High autonomy and growth environment.
- Hybrid work setup with Bay Area preference.
- Visa sponsorship available.