3 months ago
San Francisco, CA, USASenior / Mid Level
Base Salary
$200k - $325k/yr
Responsibilities
- Design, implement, and operate large-scale distributed systems for AI agents and workflow orchestration.
- Write and maintain Terraform modules for cloud infrastructure management.
- Build deployment packages and infrastructure templates for self-hosted installations.
- Provide technical guidance and troubleshooting support to enterprise customers.
- Ensure high availability and reliability of production systems through monitoring and incident response.
- Build internal tools to enhance deployment and operational efficiency for product engineers.
- Collaborate with teams to design scalable architectures for cloud and self-hosted models.
- Profile and optimize system performance across various layers.
- Implement security best practices and ensure compliance for managed and self-hosted deployments.
Requirements
- 3+ years of experience in building and operating large-scale distributed systems.
- Strong experience with Terraform for infrastructure provisioning.
- Deep knowledge of at least one major cloud provider (AWS, GCP, or Azure).
- Experience with self-hosted or on-premises software deployments for enterprises.
- Proficiency in Python, Go, or similar languages for automation and tooling.
- Strong understanding of networking, databases, and containerization technologies.
- Experience with monitoring and incident management tools.
- Ability to communicate technical concepts clearly to customers.
- Ability to debug complex system issues and implement solutions.
Benefits
- Be a key player in shaping the success of the product and company.
- Opportunity to build a new AI product offering with support from an experienced team.
- Rapid growth potential within the company.
- Join a culture that values innovation, ownership, accountability, and fun.
