GrepJob
Robin AI

Site Reliability Engineer

Robin AI
Apply
9 months ago
Cape Town, South AfricaMid Level / Senior

Responsibilities

  • Ensure high availability and scalability of Robin systems.
  • Standardize and implement observability practices in service-based architecture.
  • Design, deploy, and operate infrastructure to support product teams.
  • Add automation around manual operational tasks.
  • Collaborate with development team leads to optimize build, test, and deployment processes.
  • Participate in and improve on-call and incident handling processes.

Requirements

  • 3+ years of experience in DevOps or Site Reliability Engineering roles.
  • Proficiency in at least one backend programming language, preferably Python.
  • Strong knowledge of AWS services, managed by Terraform.
  • Comfortable troubleshooting across the full stack.
  • Knowledge of observability frameworks and tools like OpenTelemetry and DataDog.
  • Excellent problem-solving and communication skills.
  • Experience with AI/ML infrastructure deployments is a plus.

Benefits

  • Competitive salary.
  • Generous equity scheme for all employees.
  • 20 days PTO plus public holidays in South Africa.
  • Growth opportunities with a focus on promotions for high performers.

Tech Stack

AWSDatadogPythonTerraform