GrepJob
CodeRabbit

Site Reliability Engineer

CodeRabbit
Apply
4 months ago

Base Salary

$170k - $240k/yr

Responsibilities

  • Design, implement, and maintain scalable infrastructure on Google Cloud Platform.
  • Own and operate critical platform services.
  • Build and maintain Infrastructure as Code using Terraform.
  • Establish and maintain SLI/SLO frameworks for critical services.
  • Implement monitoring, alerting, and observability solutions.
  • Conduct incident response and root cause analysis.
  • Optimize application and infrastructure performance.
  • Develop self-service platforms and tooling for engineering teams.
  • Automate operational tasks including scaling and maintenance.
  • Integrate security best practices into infrastructure services.
  • Design secure network architectures and establish disaster recovery procedures.

Requirements

  • 6-8 years of experience in Site Reliability Engineering, Platform Engineering, or DevOps roles.
  • Proven track record managing production systems at scale.
  • Experience with cloud platforms, particularly AWS or Google Cloud Platform.
  • Strong background in containerization and orchestration platforms.
  • Proficiency in Node.js and TypeScript.
  • Advanced experience with Terraform for infrastructure management.
  • Hands-on experience with monitoring platforms like Datadog.
  • Strong Linux/Unix systems skills.
  • Knowledge of security principles for cloud infrastructure.
  • Familiarity with CI/CD tools and practices.

Benefits

  • Work on cutting-edge technology with real-world impact.
  • Collaborative and innovative environment.
  • Competitive salary, equity, and benefits.
  • Professional development opportunities.

Tech Stack

AWSDatadogDockerGitHub ActionsGitLab CI/CDGoogle Cloud PlatformGrafanaJenkinsKubernetesLinuxNode.jsPrometheusTerraformTypeScript

Categories