Principal Site Reliability Engineer I

about 4 hours ago

Hyderābād, IndiaStaff+

H1B Sponsor

Responsibilities

Ensure the reliability of software systems by designing and maintaining scalable infrastructure.
Develop automation tools and scripts to streamline operational tasks.
Monitor system performance and respond to incidents promptly.
Analyze system usage patterns and forecast future capacity needs.
Identify and address performance bottlenecks in software systems.
Implement infrastructure as code practices using tools like Terraform.
Maintain monitoring and logging solutions for system insights.
Collaborate with security teams to implement security best practices.
Develop and maintain disaster recovery plans.
Continuously analyze system performance for improvement opportunities.
Provide mentorship and coaching to team members.

10 - 15 years of experience in site reliability engineering.
B.Tech/M.Tech in computer science, information technology, or a related field.
Experience working for a product organization is a plus.
Certifications from cloud service providers like AWS or Google Cloud are a plus.
Proficiency in programming languages such as Python, Go, Shell, or Bash.
Strong automation skills using tools like Ansible or Terraform.
Experience with containerization technologies like Docker and Kubernetes.
Proficiency in cloud platforms such as AWS, Azure, or Google Cloud.
Familiarity with monitoring tools like Prometheus or Grafana.
Understanding of networking concepts and security best practices.