GrepJob
Duvo Inc

Site Reliability Engineer

Duvo Inc
Apply
7 months ago
Barcelona, SpainMid Level / Senior

Responsibilities

  • Own the reliability, security, and infrastructure for the AI operations platform.
  • Manage sandbox infrastructure and capacity for AI agents.
  • Build and implement observability and incident response practices.
  • Automate infrastructure management using IaC tools.
  • Lead structured incident responses and drive root cause analysis.
  • Make decisions on reliability investments and automation priorities.

Requirements

  • Experience designing and operating distributed systems at scale.
  • Strong security mindset with experience in managing enterprise data.
  • Proficiency in observability and incident response practices.
  • Familiarity with infrastructure as code and automation tools.
  • Ability to take ownership of reliability projects from proposal to production.
  • Judgment on where to invest in reliability versus shipping speed.

Benefits

  • Unlimited AI budget for tools and automation.
  • Autonomy to pursue personal and professional development.
  • Opportunity to work on a real AI product with enterprise customers.
  • Collaborative team environment that values ownership and feedback.

Tech Stack

DockerGoogle Cloud PlatformGrafanaPostgreSQLPrometheusPythonRedisTerraformTypeScript

Categories