about 24 hours ago
Responsibilities
- Design, build, and operate scalable, reliable, and secure infrastructure across AWS and GCP.
- Lead reliability and modernization initiatives, including container platform migrations.
- Serve as a technical authority in Kubernetes and cloud infrastructure.
- Partner with development teams to architect microservice-based applications.
- Implement and manage infrastructure as code using Terraform and Ansible.
- Drive improvements in observability, performance, and cost efficiency.
- Champion SRE best practices and conduct blameless postmortems.
- Lead complex technical projects from conception to completion.
- Mentor engineers and foster a culture of reliability and automation.
- Collaborate with security and compliance partners to ensure best practices.
- Participate in the on-call rotation to enhance systems and processes.
Requirements
- 8+ years in SRE, DevOps, or Infrastructure Engineering roles.
- 3–5 years of experience with Kubernetes (EKS/GKE) in production.
- 3–5 years of experience with AWS and GCP.
- 3–5 years using Terraform for multi-cloud infrastructure management.
- 5+ years of coding experience in Python, Go, or similar languages.
- Proven experience leading ECS to EKS/GKE migrations.
- Experience implementing SLOs/SLIs and improving operational resilience.
- Strong Linux and security fundamentals.
- Bachelor’s degree in Computer Science or equivalent experience.
Tech Stack
AnsibleAWSGitLab CI/CDGoGoogle Cloud PlatformGrafanaKubernetesLinuxMySQLPostgreSQLPrometheusPythonRedisSpinnakerTerraform