Principal Site Reliability Engineer I / II
Zeta
9 months ago
Hyderābād, India
Staff+
H1B Sponsor
Responsibilities
- Ensure the reliability of software systems by designing and maintaining scalable infrastructure.
- Develop automation tools and scripts to streamline operational tasks.
- Monitor system performance and respond to incidents to minimize downtime.
- Analyze system usage patterns for capacity planning.
- Identify and address performance bottlenecks in software systems.
- Implement infrastructure as code practices using tools like Terraform.
- Maintain monitoring and logging solutions for system insights.
- Collaborate with security teams to implement security best practices.
- Develop and maintain disaster recovery plans.
- Continuously analyze system performance for improvement opportunities.
- Provide mentorship and coaching to team members.
Requirements
- 10 - 15 years of experience in site reliability engineering.
- B.Tech/M.Tech in computer science, information technology, or a related field.
- Experience in a product organization is a plus.
- Certifications from cloud service providers like AWS, Google Cloud, or Microsoft are a plus.
- Proficiency in programming languages such as Python, Go, Shell, or Bash.
- Strong automation skills using tools like Ansible or Terraform.
- Experience with containerization technologies like Docker and Kubernetes.
- Proficiency in cloud platforms such as AWS, Azure, or Google Cloud.
- Familiarity with monitoring tools like Prometheus or Grafana.
- Understanding of networking concepts and security best practices.
Tech Stack
AnsibleAWSAzureBashChefDockerGitGoGoogle Cloud PlatformGrafanaKubernetesPrometheusPuppetPythonTerraform
Categories
DevOpsSecurity