7 months ago
Barcelona, SpainMid Level / Senior
Responsibilities
- Own the reliability, security, and infrastructure for the AI operations platform.
- Manage sandbox infrastructure and capacity for AI agents.
- Build and implement observability and incident response practices.
- Automate infrastructure management using IaC tools.
- Lead structured incident responses and drive root cause analysis.
- Make decisions on reliability investments and automation priorities.
Requirements
- Experience designing and operating distributed systems at scale.
- Strong security mindset with experience in managing enterprise data.
- Proficiency in observability and incident response practices.
- Familiarity with infrastructure as code and automation tools.
- Ability to take ownership of reliability projects from proposal to production.
- Judgment on where to invest in reliability versus shipping speed.
Benefits
- Unlimited AI budget for tools and automation.
- Autonomy to pursue personal and professional development.
- Opportunity to work on a real AI product with enterprise customers.
- Collaborative team environment that values ownership and feedback.
