16 days ago
Remote, United StatesSenior / Staff+
H1B Sponsor
Responsibilities
- Design the agentic SRE platform for signal ingestion and context aggregation.
- Build agents for vulnerability triage, incident response, and root-cause analysis.
- Define trust boundaries for agent actions and human approvals.
- Treat the agentic system as a production system with SLOs and observability.
- Automate the vulnerability management loop from identification to closure.
- Build an agent stack for incident response that generates actionable insights.
- Create a layer on top of the observability stack for prioritized narratives.
- Drive hardening, patching, and access control efforts.
- Collaborate with engineering and security teams for seamless integration.
- Conduct post-incident and post-remediation reviews with agent-generated drafts.
Requirements
- 8+ years of experience in Site Reliability Engineering, Security Operations, or related fields.
- Production-grade experience with vulnerability management and incident response.
- Strong fundamentals in systems, networking, and observability platforms.
- Hands-on experience with LLMs and agentic systems.
- Proficiency in designing and writing software with AI assistance.
- Strong written communication skills for explaining agent actions to humans.
- A clear opinion on automation and maintaining human oversight.
