about 1 month ago
Base Salary
$180k - $300k/yr
Responsibilities
- Define infrastructure patterns for observable, controllable, and recoverable multi-agent systems.
- Own and evolve the Infrastructure as Code (IaC) stack using Terraform and Kubernetes across multiple cloud providers.
- Build observability primitives to trace agent decisions and execution paths.
- Design and maintain CI/CD pipelines for efficient feedback from commit to production.
- Establish operational foundations including monitoring, alerting, and incident response.
- Collaborate with engineering teams to meet reliability and compliance requirements.
Requirements
- 5+ years of experience in building and operating production infrastructure in DevOps or SRE roles.
- Strong hands-on experience with Terraform.
- Deep knowledge of at least one major cloud provider (AWS, GCP, or Azure).
- Solid experience with Docker and Kubernetes in production environments.
- Experience designing and maintaining CI/CD pipelines using tools like GitHub Actions or GitLab CI.
- Proficiency in scripting languages such as Python or Bash.
- High agency and proactive problem-solving skills.
- Genuine curiosity about AI systems and their operational implications.
