GrepJob
BetterUp

Senior Site Reliability Engineer

BetterUp
Apply
5 months ago
Arlington, VA, USA +3 moreSenior / Mid Level
H1B Sponsor

Base Salary

$165k - $230k/yr

Responsibilities

  • Leverage AI-powered tools and automation for monitoring and maintaining production systems.
  • Build and operate cloud infrastructure on AWS using Terraform.
  • Manage and scale Kubernetes clusters for BetterUp's platform.
  • Design intelligent alerting and observability systems.
  • Collaborate with engineering teams to integrate reliability into the development lifecycle.
  • Automate incident response workflows and create self-healing infrastructure.
  • Experiment with emerging AI tools for log analysis and predictive maintenance.
  • Drive continuous improvement through data-driven retrospectives.

Requirements

  • 4+ years of experience in SRE or infrastructure roles.
  • Genuine excitement about AI tooling and its application.
  • Deep experience with AWS.
  • Hands-on experience with Kubernetes, including deployment and scaling.
  • Strong Terraform skills for managing complex infrastructure.
  • Familiarity with modern observability stacks like Datadog and Prometheus.
  • Strong debugging instincts in distributed systems.
  • Clear communication skills for explaining incidents to various stakeholders.
  • A builder's mindset focused on automation.

Benefits

  • Access to BetterUp coaching for you and a friend or family member.
  • Competitive compensation plan with advancement opportunities.
  • Medical, dental, and vision insurance.
  • Flexible paid time off including federal holidays and additional Inner Workdays.
  • Learning and Development stipend.
  • Company-wide Summer and Winter breaks.
  • Year-round charitable contributions on behalf of BetterUp.
  • 401(k) self-contribution options.

Tech Stack

AWSDatadogKubernetesPrometheusTerraform

Categories