
DevOps / Site Reliability Engineer
Bespoke Labs3 days ago
Remote, WorldwideMid Level / Senior
H1B Sponsor
Responsibilities
- Own cloud infrastructure on AWS including EC2, EKS, RDS, S3, IAM, and VPC.
- Manage Kubernetes clusters and container orchestration end-to-end.
- Build and maintain CI/CD pipelines using GitHub Actions or similar tools.
- Implement monitoring, alerting, and observability stacks like Prometheus or Grafana.
- Improve reliability, performance, and security of production systems.
- Automate infrastructure using Terraform or similar IaC tools.
- Debug and resolve issues across complex, distributed systems.
- Participate in design reviews to enhance infrastructure quality.
Requirements
- 3-5 years of experience in DevOps, SRE, or infrastructure engineering.
- Strong experience with AWS services such as EKS, EC2, RDS, S3, and IAM.
- Proficiency in managing Kubernetes for deployment and troubleshooting.
- Experience with CI/CD pipelines using GitHub Actions or similar.
- Familiarity with Infrastructure as Code tools like Terraform or Pulumi.
- Scripting skills in Python or Go.
- Experience working in production environments with real users.
- Ability to operate autonomously in ambiguous situations.
Benefits
- Competitive compensation and meaningful equity.
- Direct impact on frontier AI model training and evaluation infrastructure.
- Flexible, remote-friendly work environment with low bureaucracy.
- Opportunity to work with a small, high-caliber team with deep AI research expertise.
- Health, wellness, and learning & development benefits.