GrepJob
Bolt.new

Staff Site Reliability Engineer

Bolt.new
Apply
about 4 hours ago
Remote, WorldwideStaff+

Responsibilities

  • Partner with development teams throughout the project lifecycle to ensure reliability is designed in from the start.
  • Establish and evolve design reviews, launch checklists, and operational acceptance criteria.
  • Define meaningful SLIs, SLOs, and error budgets to guide prioritization decisions.
  • Create frameworks and tooling across AWS, GCP, and Azure to simplify reliable engineering practices.
  • Influence roadmaps and resolve technical disagreements across teams.
  • Lead incident management practices and facilitate blameless postmortems.
  • Build relationships with cloud providers to influence roadmaps and capabilities.

Requirements

  • General fluency across AWS, GCP, and Azure is essential.
  • Comfort with TypeScript and Ruby on Rails is required.
  • Significant experience as an SRE or production engineer with a focus on reliability is necessary.
  • Strong software engineering fundamentals and the ability to write production-quality code are required.
  • Proven track record of influencing teams and processes without formal authority is essential.
  • Ability to drive ambiguous, high-scope problems to completion with minimal oversight is necessary.
  • Experience in identifying and proposing solutions for process and technical debt is required.
  • Strong verbal and written English communication skills are essential.