about 4 hours ago
Responsibilities
- Design and automate large-scale distributed systems.
- Build tools and automation for higher availability and efficiency.
- Collaborate with engineering teams to deliver high-quality software.
- Monitor production environments and implement preventive measures.
- Work with delivery teams on software improvements for availability and MTTD reduction.
- Participate in on-call rotation every 5-6 weeks.
Requirements
- 7+ years of public cloud experience (GCP or AWS).
- 7+ years of private cloud experience (Openstack, VMware ESXi).
- 7+ years of Linux experience (Centos/Ubuntu).
- 7+ years of networking experience.
- Experience with Kubernetes, Helm, GKE, and Docker.
- Familiarity with Gitlab, ArgoCD, CI/CD methodologies and tools.
- Experience with Infrastructure as Code (Terraform) and scripting languages like Python.
- Knowledge of Ansible config management tool.
- Experience with observability tools like Prometheus, Loki, and Grafana.
- Strong problem-solving and analytical skills.
- Familiarity with Incident Management Process and SRE best practices.
- Bachelor's degree in Computer Science or equivalent experience.