GrepJob
Point72

Site Reliability Engineer

Point72
Apply
19 days ago
Bengaluru, IndiaMid Level / Senior

Responsibilities

  • Design and implement automated operational workflows to improve system reliability.
  • Build and maintain observability solutions using tools like Datadog.
  • Partner with development teams to enhance application reliability and performance.
  • Develop and maintain CI/CD pipelines and deployment automation.
  • Engineer scalable solutions for production environments across Linux and Windows.
  • Automate infrastructure and operational tasks using scripting languages.
  • Support and enhance reliability of database platforms like SQL Server and MongoDB.
  • Participate in incident response and drive root cause analysis.
  • Define and enforce SLOs, SLIs, and error budgets.
  • Collaborate with Networking, Platform, and Security teams.

Requirements

  • Strong hands-on experience with Linux and Windows operating systems.
  • Proven experience building automation and tooling using Python or similar languages.
  • Deep understanding of observability and monitoring, preferably with Datadog.
  • Experience with CI/CD pipelines and deployment automation tools.
  • Operational and performance knowledge of SQL Server and MongoDB.
  • Familiarity with cloud platforms like AWS and hybrid architectures.
  • Solid understanding of networking concepts such as DNS and TCP/IP.
  • Experience working closely with application development teams in an SRE or DevOps role.
  • Experience with Kubernetes, OpenShift, and containerized workloads.
  • Knowledge of infrastructure-as-code tools like Terraform or CloudFormation.
  • Experience implementing automated scaling and performance tuning.
  • Background in reliability engineering or DevOps in an enterprise environment.
  • Familiarity with security and compliance considerations in production systems.
  • Strong bias toward automation over manual processes.
  • Focus on improving long-term reliability rather than reactive firefighting.
  • Comfortable owning systems end-to-end and driving improvements.
  • Clear communication skills for effective collaboration across teams.
  • Commitment to the highest ethical standards.

Tech Stack

AWSBashDatadogGitHub ActionsJenkinsKubernetesLinuxMicrosoft SQL ServerMongoDBOpenShiftPowerShellPythonTerraformWindows

Categories