GrepJob
Bespoke Labs

DevOps / Site Reliability Engineer

Bespoke Labs
Apply
3 days ago
Remote, WorldwideMid Level / Senior
H1B Sponsor

Responsibilities

  • Own cloud infrastructure on AWS including EC2, EKS, RDS, S3, IAM, and VPC.
  • Manage Kubernetes clusters and container orchestration end-to-end.
  • Build and maintain CI/CD pipelines using GitHub Actions or similar tools.
  • Implement monitoring, alerting, and observability stacks like Prometheus or Grafana.
  • Improve reliability, performance, and security of production systems.
  • Automate infrastructure using Terraform or similar IaC tools.
  • Debug and resolve issues across complex, distributed systems.
  • Participate in design reviews to enhance infrastructure quality.

Requirements

  • 3-5 years of experience in DevOps, SRE, or infrastructure engineering.
  • Strong experience with AWS services such as EKS, EC2, RDS, S3, and IAM.
  • Proficiency in managing Kubernetes for deployment and troubleshooting.
  • Experience with CI/CD pipelines using GitHub Actions or similar.
  • Familiarity with Infrastructure as Code tools like Terraform or Pulumi.
  • Scripting skills in Python or Go.
  • Experience working in production environments with real users.
  • Ability to operate autonomously in ambiguous situations.

Benefits

  • Competitive compensation and meaningful equity.
  • Direct impact on frontier AI model training and evaluation infrastructure.
  • Flexible, remote-friendly work environment with low bureaucracy.
  • Opportunity to work with a small, high-caliber team with deep AI research expertise.
  • Health, wellness, and learning & development benefits.

Tech Stack

AWSDatadogGitHub ActionsGoGrafanaKubernetesPrometheusPythonTerraform

Categories