Senior Site Reliability Engineer, AI Research
Algolia
about 1 month ago
Remote, Australia
Senior
H1B Sponsor
Responsibilities
- Support and evolve the reliability of platforms used by the AI Research team.
- Ensure production services meet expectations for availability, latency, and operational readiness.
- Design infrastructure and operational patterns that prioritize iteration speed.
- Work closely with researchers and engineers in a cross-functional setting.
- Participate directly in team planning and execution.
- Help researchers self-serve infrastructure safely and effectively.
- Build and maintain Kubernetes-based services on GCP using infrastructure-as-code.
- Own and improve CI/CD pipelines for services primarily written in Go.
- Design and operate observability systems using tools such as Datadog.
- Participate in an on-call rotation, responding to incidents.
Requirements
- Strong experience operating cloud-first infrastructure.
- Hands-on experience running production services on Kubernetes.
- Proficiency with infrastructure-as-code (Terraform) and CI/CD systems.
- Experience supporting production services written in Go.
- Solid grounding in service reliability, incident response, and operational best practices.
- Comfort working in environments with ambiguity.
Benefits
- High impact work that enables new AI-powered capabilities.
- High agency in shaping what gets built and how.
- Collaboration with experienced SREs, engineers, and PhD researchers.
- Opportunities for growth in research-adjacent infrastructure.
- Flexible workplace model with remote-friendly culture.
Tech Stack
DatadogGoGoogle Cloud PlatformKubernetesPythonTerraform
Categories
AI & MLDevOps