20 days ago
Responsibilities
- Partner with engineers to build developer tools that enhance workflows and deployment infrastructure.
- Ensure the reliability of multi-cloud Kubernetes clusters and pipelines.
- Implement metrics, logging, analytics, and alerting for performance and security.
- Develop infrastructure-as-code deployment tooling across multiple cloud providers.
- Automate operations and engineering processes to improve efficiency.
- Build machine learning infrastructure for AI teams to work with large-scale datasets.
Requirements
- 5+ years of experience in DevOps, Site Reliability Engineering, or a related field.
- Deep proficiency in coding languages such as Golang or Python.
- Strong familiarity with container-related security best practices.
- Production experience with Kubernetes and its ecosystem, including tools like cert-manager or external-dns.
- Experience with Kubernetes templating tools such as Helm or Kustomize.
- Proficient in infrastructure-as-code tools like Terraform or CloudFormation.
- Experience with AWS services such as IAM, S3, EC2, and EKS.
- Familiarity with other cloud providers like Google Cloud and Azure is a plus.
- Production experience with database software such as PostgreSQL.
- Experience with GitOps tooling like Flux or Argo and CI/CD tools such as GitHub Actions.
Benefits
- Variety of medical, dental, and vision plans for employees and their families.
- Paid parental leave to support family needs.
- Monthly Health & Wellness allowance.
- Work from home office stipend to enhance remote work success.
