Zeta

Principal Site Reliability Engineer I / II

Zeta

Apply
9 months ago
Hyderābād, India
Staff+
H1B Sponsor

Responsibilities

  • Ensure the reliability of software systems by designing and maintaining scalable infrastructure.
  • Develop automation tools and scripts to streamline operational tasks.
  • Monitor system performance and respond to incidents to minimize downtime.
  • Analyze system usage patterns for capacity planning.
  • Identify and address performance bottlenecks in software systems.
  • Implement infrastructure as code practices using tools like Terraform.
  • Maintain monitoring and logging solutions for system insights.
  • Collaborate with security teams to implement security best practices.
  • Develop and maintain disaster recovery plans.
  • Continuously analyze system performance for improvement opportunities.
  • Provide mentorship and coaching to team members.

Requirements

  • 10 - 15 years of experience in site reliability engineering.
  • B.Tech/M.Tech in computer science, information technology, or a related field.
  • Experience in a product organization is a plus.
  • Certifications from cloud service providers like AWS, Google Cloud, or Microsoft are a plus.
  • Proficiency in programming languages such as Python, Go, Shell, or Bash.
  • Strong automation skills using tools like Ansible or Terraform.
  • Experience with containerization technologies like Docker and Kubernetes.
  • Proficiency in cloud platforms such as AWS, Azure, or Google Cloud.
  • Familiarity with monitoring tools like Prometheus or Grafana.
  • Understanding of networking concepts and security best practices.

Tech Stack

AnsibleAWSAzureBashChefDockerGitGoGoogle Cloud PlatformGrafanaKubernetesPrometheusPuppetPythonTerraform

Categories

DevOpsSecurity