Upstart

Principal Software Engineer, Site Reliability

Upstart

Apply
14 days ago
Remote, Worldwide
Staff+
H1B Sponsor

Base Salary

$195k - $270k/yr

Responsibilities

  • Lead the definition and adoption of SRE principles across engineering teams.
  • Partner with leadership to shape long-term reliability and observability strategies.
  • Champion distributed tracing and key performance metrics to improve system visibility.
  • Build and scale self-healing systems to minimize manual intervention.
  • Drive improvements to incident response processes, including for Machine Learning systems.
  • Collaborate with Development Productivity and Quality teams to enhance engineering velocity.
  • Influence technical roadmaps through data-driven insights and contributions.
  • Own and deliver cross-functional initiatives from concept through execution.

Requirements

  • 10+ years of experience in Software Engineering and Site Reliability Engineering.
  • Proven track record as an SRE thought leader and evangelist.
  • Strong communication and mentoring skills.
  • Proficiency in Python, Go, and JavaScript/TypeScript.
  • Experience with Infrastructure as Code tools like Terraform and CloudFormation.
  • Expertise in observability and performance monitoring tools.
  • Experience with on-call and incident management.
  • Strong background in automation and building self-healing systems.
  • Hands-on experience with LLM/GenAI to improve SRE processes.
  • Program management skills to drive cross-functional projects.

Benefits

  • Competitive compensation with base pay, bonuses, and equity grants.
  • Generous 401(k) plan with matching contributions.
  • Employee Stock Purchase Plan with discounted stock options.
  • Affordable medical, dental, and vision coverage with high employer contribution.
  • Paid time off, sick leave, and company holidays.
  • Paid family and parental leave.
  • Employee Assistance Program offering mental health support.
  • Annual wellness and productivity allowances.
  • Connection through team events and community initiatives.

Tech Stack

DatadogGoJavaScriptPrometheusPythonTerraformTypeScript

Categories

AI & MLData EngineeringDevOpsFull Stack