Lead Site Reliability Engineer
Zeta
about 11 hours ago
Bengaluru, India
Senior / Staff+
H1B Sponsor
Responsibilities
- Ensure the reliability of software systems by designing and maintaining scalable infrastructure.
- Develop automation tools and scripts to streamline operational tasks.
- Monitor system performance and respond to incidents to minimize downtime.
- Analyze system usage patterns and forecast future capacity needs.
- Identify and address performance bottlenecks in software systems.
- Implement infrastructure as code practices using tools like Terraform.
- Maintain monitoring and logging solutions for system insights.
- Collaborate with security teams to implement security best practices.
- Develop and maintain disaster recovery plans.
- Continuously analyze system performance and implement improvements.
- Lead and motivate a team of SREs.
- Provide mentorship and coaching to team members.
- Resolve conflicts and address challenges within the team.
Requirements
- 7 - 9 years of experience in site reliability engineering.
- B.Tech/M.Tech in computer science, information technology, or a related field.
- Experience working for a product organization is a plus.
- Certifications from cloud service providers like AWS, Google Cloud, or Microsoft are a plus.
- Proficiency in programming languages such as Python, Go, Shell, or Bash.
- Strong automation skills using tools like Ansible or Terraform.
- Experience with containerization technologies like Docker and Kubernetes.
- Proficiency in cloud platforms such as AWS, Azure, or Google Cloud.
- Familiarity with monitoring tools like Prometheus or Grafana.
- Understanding of networking concepts and security best practices.
Tech Stack
AnsibleAWSAzureBashChefDockerGitGoGoogle Cloud PlatformGrafanaKubernetesPrometheusPuppetPythonTerraform
Categories
DevOpsSecurity