about 4 hours ago
Responsibilities
- Design and automate large-scale distributed systems.
- Build tools and automation for higher availability and efficiency.
- Collaborate with engineering teams to deliver high-quality software.
- Monitor production environments and implement preventive measures.
- Work with delivery teams on software improvements for availability and MTTD reduction.
- Participate in on-call rotation every 5-6 weeks.
Requirements
- 7+ years of public cloud experience (GCP or AWS).
- 7+ years of private cloud experience (Openstack, VMware ESXi).
- 7+ years of Linux experience (Centos/Ubuntu).
- 7+ years of networking experience.
- Experience with Kubernetes, Helm, GKE, and Docker.
- Familiarity with Gitlab, ArgoCD, and CI/CD methodologies.
- Experience with Infrastructure as Code (Terraform) and scripting languages like Python.
- Knowledge of Ansible and observability tools like Prometheus and Grafana.
- Strong problem-solving and analytical skills.
- Experience with Incident Management and SRE best practices.
- Bachelor's degree in Computer Science or equivalent experience.