GrepJob
AlphaSense

Staff Site Reliability Engineer

AlphaSense
Apply
about 1 month ago
Delhi, IndiaStaff+
H1B Sponsor

Responsibilities

  • Architect reliability frameworks and self-service tooling for service ownership.
  • Drive AIOps strategy for automated diagnostics and proactive failure prevention.
  • Embed SRE practices through design reviews and operational standards.
  • Lead incident management as Incident Commander during critical events.
  • Deliver end-to-end monitoring and profiling to optimize system performance.
  • Mentor engineers across SRE and product teams through knowledge sharing.

Requirements

  • 8+ years of experience in Site Reliability Engineering or similar roles.
  • At least 3 years in a Senior+ SRE position.
  • Strong background in running production SaaS systems at scale.
  • Proficiency in programming/scripting languages like Python or Go.
  • Hands-on expertise with cloud platforms such as AWS, GCP, or Azure.
  • Deep understanding of networking fundamentals like TCP/IP and DNS.
  • Experience with monitoring and alerting tools like Prometheus and Grafana.
  • Familiarity with advanced observability techniques.
  • Proven incident management experience with high-severity incidents.
  • Strong troubleshooting skills across the full stack.
  • Excellent communication and collaboration skills.

Tech Stack

AWSAzureDatadogGoGoogle Cloud PlatformGrafanaKubernetesPrometheusPython

Categories