GrepJob
Aisle

Agentic Site Reliability Engineer

Aisle
Apply
16 days ago
Remote, United StatesSenior / Staff+
H1B Sponsor

Responsibilities

  • Design the agentic SRE platform for signal ingestion and context aggregation.
  • Build agents for vulnerability triage, incident response, and root-cause analysis.
  • Define trust boundaries for agent actions and human approvals.
  • Treat the agentic system as a production system with SLOs and observability.
  • Automate the vulnerability management loop from identification to closure.
  • Build an agent stack for incident response that generates actionable insights.
  • Create a layer on top of the observability stack for prioritized narratives.
  • Drive hardening, patching, and access control efforts.
  • Collaborate with engineering and security teams for seamless integration.
  • Conduct post-incident and post-remediation reviews with agent-generated drafts.

Requirements

  • 8+ years of experience in Site Reliability Engineering, Security Operations, or related fields.
  • Production-grade experience with vulnerability management and incident response.
  • Strong fundamentals in systems, networking, and observability platforms.
  • Hands-on experience with LLMs and agentic systems.
  • Proficiency in designing and writing software with AI assistance.
  • Strong written communication skills for explaining agent actions to humans.
  • A clear opinion on automation and maintaining human oversight.

Tech Stack

DatadogGrafanaPrometheus