about 14 hours ago
Responsibilities
- Design, implement, and automate large-scale distributed systems.
- Build tools and automation to improve availability, scalability, and efficiency.
- Collaborate with engineering teams to deliver high-quality software.
- Monitor production and development environments to ensure seamless customer experience.
- Work with delivery teams on software improvements to enhance availability and reduce MTTD.
- Participate in on-call rotation (8h shift, 7 days a week, every 3-4 weeks).
Requirements
- 7+ years of public cloud experience (GCP or AWS).
- 7+ years of private cloud experience (Openstack, VMware ESXi).
- 7+ years of Linux experience (Centos/Ubuntu).
- 7+ years of networking experience.
- Experience with orchestration and management of containers using Kubernetes, Helm, GKE, and Docker.
- Familiarity with Gitlab, ArgoCD, CI/CD methodologies and tools.
- Experience with Infrastructure as Code (Terraform), Python or other scripting languages.
- Experience with Ansible configuration management tool.
- Experience with observability tools like Prometheus, Loki, Alertmanager, and Grafana.
- Excellent problem-solving and analytical skills.
- Experience with Incident Management Process, SRE best practices, and continuous improvements.
- Experience designing large-scale distributed systems.
- Strong verbal and written communication skills.
- Bachelor's of Science degree in Computer Science or equivalent experience.