9 months ago
Cape Town, South AfricaMid Level / Senior
Responsibilities
- Ensure high availability and scalability of Robin systems.
- Standardize and implement observability practices in service-based architecture.
- Design, deploy, and operate infrastructure to support product teams.
- Add automation around manual operational tasks.
- Collaborate with development team leads to optimize build, test, and deployment processes.
- Participate in and improve on-call and incident handling processes.
Requirements
- 3+ years of experience in DevOps or Site Reliability Engineering roles.
- Proficiency in at least one backend programming language, preferably Python.
- Strong knowledge of AWS services, managed by Terraform.
- Comfortable troubleshooting across the full stack.
- Knowledge of observability frameworks and tools like OpenTelemetry and DataDog.
- Excellent problem-solving and communication skills.
- Experience with AI/ML infrastructure deployments is a plus.
Benefits
- Competitive salary.
- Generous equity scheme for all employees.
- 20 days PTO plus public holidays in South Africa.
- Opportunities for career growth and promotions for high performers.
