GrepJob
Airbyte

Site Reliability Engineer

Airbyte
Apply
about 2 months ago

Base Salary

$190k - $220k/yr

Responsibilities

  • Own the infrastructure for the Data Replication platform, including Kubernetes clusters and CI/CD pipelines.
  • Collaborate with product engineers to integrate features reliably with infrastructure.
  • Enhance observability, alerting, and anomaly detection with a focus on LLM automation.
  • Develop AI-augmented release and internal tooling for automated deployments and rollbacks.
  • Set infrastructure standards by creating self-serve tooling and coaching engineers.

Requirements

  • 7+ years of experience in infrastructure, platform engineering, SRE, or DevOps.
  • Hands-on experience with Kubernetes, Helm, and Terraform in production.
  • Deep knowledge of observability stacks like Prometheus, Grafana, and Datadog.
  • Experience managing CI/CD pipelines and developer tooling.
  • Ability to read backend code to troubleshoot and instrument systems.
  • Fluency with AI tools and frameworks for automation and debugging.
  • A startup-ready mindset, comfortable with ambiguity and fast-paced problem-solving.

Tech Stack

AirbyteAWSDatadogGoogle Cloud PlatformGrafanaKubernetesPrometheusTerraform

Categories

AI & MLData EngineeringDevOps