GrepJob
Serval

Senior Software Engineer, Infrastructure

Serval
Apply
3 months ago
San Francisco, CA, USASenior / Mid Level

Base Salary

$200k - $325k/yr

Responsibilities

  • Design, implement, and operate large-scale distributed systems for AI agents and workflow orchestration.
  • Write and maintain Terraform modules for cloud infrastructure management.
  • Build deployment packages and infrastructure templates for self-hosted installations.
  • Provide technical guidance and troubleshooting support to enterprise customers.
  • Ensure high availability and reliability of production systems through monitoring and incident response.
  • Build internal tools to enhance deployment and operational efficiency for product engineers.
  • Collaborate with teams to design scalable architectures for cloud and self-hosted models.
  • Profile and optimize system performance across various layers.
  • Implement security best practices and ensure compliance for managed and self-hosted deployments.

Requirements

  • 3+ years of experience in building and operating large-scale distributed systems.
  • Strong experience with Terraform for infrastructure provisioning.
  • Deep knowledge of at least one major cloud provider (AWS, GCP, or Azure).
  • Experience with self-hosted or on-premises software deployments for enterprises.
  • Proficiency in Python, Go, or similar languages for automation and tooling.
  • Strong understanding of networking, databases, and containerization technologies.
  • Experience with monitoring and incident management tools.
  • Ability to communicate technical concepts clearly to customers.
  • Ability to debug complex system issues and implement solutions.

Benefits

  • Be a key player in shaping the success of the product and company.
  • Opportunity to build a new AI product offering with support from an experienced team.
  • Rapid growth potential within the company.
  • Join a culture that values innovation, ownership, accountability, and fun.

Tech Stack

Apache KafkaAWSAzureDatadogDockerGoGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonTerraform

Categories

AI & MLData EngineeringDevOpsSecurity