GrepJob
AlphaSense

Staff Site Reliability Engineer

AlphaSense
Apply
about 1 month ago
Bengaluru, IndiaStaff+
H1B Sponsor

Responsibilities

  • Architect frameworks and self-service tooling for service reliability.
  • Drive the AIOps strategy for automated diagnostics and proactive failure prevention.
  • Embed SRE practices across engineering through design reviews and operational standards.
  • Act as Incident Commander during critical events and lead blameless postmortems.
  • Deliver end-to-end monitoring and profiling to optimize performance.
  • Mentor engineers across SRE and product teams through technical guidance.

Requirements

  • 8+ years of experience in Site Reliability Engineering, DevOps, or a similar role.
  • At least 3+ years in a Senior+ SRE position.
  • Strong background in running production SaaS systems at scale.
  • Proficiency in at least one programming/scripting language (Python, Go, etc.).
  • Hands-on expertise with cloud platforms (AWS, GCP, or Azure) and Kubernetes.
  • Deep understanding of networking fundamentals (TCP/IP, DNS, HTTP/S, load balancing).
  • Experience with monitoring and alerting tools (Prometheus, Grafana, Datadog, ELK).
  • Familiarity with advanced observability tools (OTEL, continuous profiling).
  • Proven incident management experience, including leading high-severity incidents.
  • Strong troubleshooting skills across the full stack.
  • Excellent communication and collaboration skills.

Tech Stack

AWSAzureDatadogGoGoogle Cloud PlatformGrafanaKubernetesPrometheusPython

Categories