GrepJob
AlphaSense

Staff Site Reliability Engineer

AlphaSense
Apply
7 days ago
Remote, United StatesStaff+
H1B Sponsor

Base Salary

$150k - $225k/yr

Responsibilities

  • Architect reliability frameworks and self-service tooling for teams.
  • Drive AI-driven reliability through automation of diagnostics and proactive failure prevention.
  • Embed SRE practices across engineering via design reviews and operational standards.
  • Act as Incident Commander during critical events and lead blameless postmortems.
  • Deliver end-to-end monitoring and profiling to optimize performance.
  • Mentor engineers across SRE and product teams through knowledge sharing.

Requirements

  • 8+ years of experience in Site Reliability Engineering, DevOps, or a similar role.
  • At least 3+ years in a Senior+ SRE position.
  • Strong background in running production SaaS systems at scale.
  • Proficiency in at least one programming/scripting language (Python, Go, or similar).
  • Hands-on expertise with cloud platforms (AWS, GCP, or Azure) and Kubernetes.
  • Deep understanding of networking fundamentals (TCP/IP, DNS, HTTP/S, load balancing).
  • Experience with monitoring and alerting tools (Prometheus, Grafana, Datadog, ELK).
  • Familiarity with advanced observability techniques (OTEL, continuous profiling).
  • Proven incident management experience, including leading high-severity incidents.
  • Strong troubleshooting skills across the full stack.
  • Excellent communication and collaboration skills.

Tech Stack

AWSAzureDatadogGoGoogle Cloud PlatformGrafanaKubernetesPrometheusPython

Categories