GrepJob
PhonePe

Site Reliability Engineer 2 - BigData

PhonePe
Apply
about 5 hours ago
Bengaluru, IndiaMid Level / Senior
H1B Sponsor

Responsibilities

  • Manage, maintain, and support incremental changes to Linux/Unix environments.
  • Lead on-call rotations and incident responses, conducting root cause analysis.
  • Design and implement automation systems for managing big data infrastructure.
  • Troubleshoot and resolve complex production issues while identifying root causes.
  • Design and review scalable and reliable system architectures.
  • Collaborate with teams to optimize overall system/cluster performance.
  • Enforce security standards across systems and infrastructure.
  • Ensure availability, performance, and scalability of systems through proactive monitoring.
  • Develop tools and scripts to automate operational processes.
  • Monitor and optimize system performance and resource usage.
  • Collaborate with development teams to integrate best practices into the software development lifecycle.
  • Stay informed of industry technology trends and contribute to technology communities.
  • Develop and enforce SRE best practices and principles.
  • Align across functional teams on priorities and deliverables.
  • Drive automation to enhance operational efficiency.

Requirements

  • Over 4 years of experience managing and maintaining distributed big data ecosystems.
  • Strong expertise in Linux including IP, Iptables, and IPsec.
  • Proficiency in scripting/programming with languages like Perl, Golang, or Python.
  • Hands-on experience with the Hadoop stack (HDFS, HBase, Airflow, YARN, Ranger, Kafka, Pinot).
  • Familiarity with open-source configuration management and deployment tools.
  • Solid understanding of networking, open-source technologies, and related tools.
  • Excellent communication and collaboration skills.
  • Experience with DevOps tools such as Saltstack, Ansible, Docker, and Git.
  • Familiarity with SRE logging and monitoring tools like ELK stack, Grafana, and Prometheus.

Benefits

  • Medical, Critical Illness, Accidental, and Life Insurance.
  • Employee Assistance Program and Onsite Medical Center.
  • Maternity and Paternity Benefits, Adoption Assistance, and Day-care Support.
  • Relocation benefits and Transfer Support Policy.
  • Retirement benefits including Employee PF Contribution and Gratuity.
  • Higher Education Assistance and Car Lease options.

Tech Stack

AnsibleApache AirflowApache HadoopApache HBaseApache KafkaAWSAzureChefDockerGitGoGoogle Cloud PlatformGrafanaLinuxPerlPrometheusPuppetPythonYarn

Categories

Data EngineeringDevOps