Algolia

Senior Site Reliability Engineer, AI Research

Algolia

Apply
about 1 month ago
Remote, Australia
Senior
H1B Sponsor

Responsibilities

  • Support and evolve the reliability of platforms used by the AI Research team.
  • Ensure production services meet expectations for availability, latency, and operational readiness.
  • Design infrastructure and operational patterns that prioritize iteration speed.
  • Work closely with researchers and engineers in a cross-functional setting.
  • Participate directly in team planning and execution.
  • Help researchers self-serve infrastructure safely and effectively.
  • Build and maintain Kubernetes-based services on GCP using infrastructure-as-code.
  • Own and improve CI/CD pipelines for services primarily written in Go.
  • Design and operate observability systems using tools such as Datadog.
  • Participate in an on-call rotation, responding to incidents.

Requirements

  • Strong experience operating cloud-first infrastructure.
  • Hands-on experience running production services on Kubernetes.
  • Proficiency with infrastructure-as-code (Terraform) and CI/CD systems.
  • Experience supporting production services written in Go.
  • Solid grounding in service reliability, incident response, and operational best practices.
  • Comfort working in environments with ambiguity.

Benefits

  • High impact work that enables new AI-powered capabilities.
  • High agency in shaping what gets built and how.
  • Collaboration with experienced SREs, engineers, and PhD researchers.
  • Opportunities for growth in research-adjacent infrastructure.
  • Flexible workplace model with remote-friendly culture.

Tech Stack

DatadogGoGoogle Cloud PlatformKubernetesPythonTerraform

Categories

AI & MLDevOps