
Staff Site Reliability Engineer
AlphaSenseabout 1 month ago
Responsibilities
- Architect frameworks and self-service tooling for service reliability.
- Drive the AIOps strategy for automated diagnostics and proactive failure prevention.
- Embed SRE practices across engineering through design reviews and operational standards.
- Act as Incident Commander during critical events and lead blameless postmortems.
- Deliver end-to-end monitoring and profiling to optimize system performance.
- Mentor engineers across SRE and product teams through technical guidance.
Requirements
- 8+ years of experience in Site Reliability Engineering, DevOps, or a similar role.
- At least 3+ years in a Senior+ SRE position.
- Strong background in running production SaaS systems at scale.
- Proficiency in at least one programming/scripting language like Python or Go.
- Hands-on expertise with cloud platforms such as AWS, GCP, or Azure.
- Deep understanding of networking fundamentals including TCP/IP and DNS.
- Experience with monitoring and alerting tools like Prometheus and Grafana.
- Familiarity with advanced observability tools like OTEL.
- Proven incident management experience with high-severity incidents.
- Strong troubleshooting skills across the full stack.
- Excellent communication and collaboration skills.