GrepJob
xAI

Site Reliability Engineer (SRE)

xAI
Apply
13 days ago
London, United KingdomMid Level / Senior
H1B Sponsor

Responsibilities

  • Develop and maintain backend services for AI products.
  • Ensure services are scalable and reliable, processing high query volumes.
  • Work with Kubernetes clusters for service hosting.
  • Implement continuous deployment systems.
  • Utilize monitoring technologies to maintain service health.

Requirements

  • Expert knowledge of Kubernetes.
  • Expert knowledge of continuous deployment systems like Buildkite and ArgoCD.
  • Expert knowledge of monitoring technologies such as Prometheus, Grafana, and PagerDuty.
  • Expert knowledge of infrastructure as code technologies like Pulumi or Terraform.
  • Familiarity with systems programming languages such as Rust, C++, or Go.
  • Experience with traffic management and HTTP proxies like nginx and envoy.

Tech Stack

AmbassadorBuildkiteC++GoGrafanaKubernetesPrometheusRustTerraform

Categories