GrepJob
Sage

Senior/Staff Site Reliability Engineer

Sage
Apply
11 days ago
New York, NY, USASenior / Staff+
H1B Sponsor

Base Salary

$175k - $230k/yr

Responsibilities

  • Design and evolve highly reliable system architectures for high availability and scalability.
  • Lead complex incident response efforts and coordinate across engineering teams.
  • Define and implement organization-wide observability practices.
  • Establish and maintain reliability standards, including SLIs and SLOs.
  • Drive automation and infrastructure improvements to reduce operational toil.
  • Partner with engineering teams on system design and architecture reviews.
  • Evolve Sage’s cloud infrastructure to support scalable systems.
  • Operate and improve critical data infrastructure for high availability.
  • Lead capacity planning and auto-scaling efforts.
  • Build internal tooling and platforms to enhance the developer experience.

Requirements

  • 7-12+ years of experience in software engineering or site reliability engineering.
  • Experience with edge or device-based systems and managing connectivity.
  • Strong networking fundamentals and experience debugging distributed systems.
  • Experience operating and scaling production databases like PostgreSQL or MySQL.
  • Deep expertise in cloud infrastructure, particularly AWS or Google Cloud.
  • Strong experience in designing highly available systems.
  • Expertise in containerization and orchestration, especially with Kubernetes.
  • Advanced observability and monitoring skills using tools like Datadog.
  • Strong programming ability in languages like Go, Python, or Java.
  • Deep knowledge of infrastructure-as-code practices and tools like Terraform.
  • Ability to influence engineering teams and guide best practices.
  • Strong incident management and production debugging skills.

Benefits

  • Competitive base compensation along with stock options.
  • Fully-paid health and dental insurance coverage.
  • Take as you need time off policy, plus 7 paid holidays.
  • Company-wide winter break during the holidays.
  • Office lunch and a fully stocked snack bar.
  • Up to 2 remote days per week.

Tech Stack

AWSDatadogGoGoogle Cloud PlatformGrafanaJavaKubernetesMySQLPostgreSQLPrometheusPythonTerraform

Categories

Data EngineeringDevOpsSecurity