GrepJob
AlphaSense

Staff Site Reliability Engineer

AlphaSense
Apply
about 1 month ago
Pune, IndiaStaff+
H1B Sponsor

Responsibilities

  • Architect frameworks and self-service tooling for service reliability.
  • Drive the AIOps strategy for automated diagnostics and proactive failure prevention.
  • Embed SRE practices across engineering through design reviews and operational standards.
  • Act as Incident Commander during critical events and lead blameless postmortems.
  • Deliver end-to-end monitoring and profiling to optimize system performance.
  • Mentor engineers across SRE and product teams through technical guidance.

Requirements

  • 8+ years of experience in Site Reliability Engineering, DevOps, or a similar role.
  • At least 3+ years in a Senior+ SRE position.
  • Strong background in running production SaaS systems at scale.
  • Proficiency in at least one programming/scripting language like Python or Go.
  • Hands-on expertise with cloud platforms such as AWS, GCP, or Azure.
  • Deep understanding of networking fundamentals including TCP/IP and DNS.
  • Experience with monitoring and alerting tools like Prometheus and Grafana.
  • Familiarity with advanced observability tools like OTEL.
  • Proven incident management experience with high-severity incidents.
  • Strong troubleshooting skills across the full stack.
  • Excellent communication and collaboration skills.

Tech Stack

AWSAzureDatadogGoGoogle Cloud PlatformGrafanaKubernetesPrometheusPython

Categories