about 3 hours ago
Responsibilities
- Design and optimize large-scale Aerospike cloud platform infrastructure.
- Lead development of automation and infrastructure-as-code solutions.
- Build and maintain monitoring and observability systems.
- Conduct incident response and post-mortem activities.
- Enforce security best practices for cloud infrastructure.
- Collaborate with development teams for reliable service delivery.
- Participate in on-call rotation for critical incidents.
- Establish documentation standards and runbooks.
- Lead capacity planning and performance optimization efforts.
- Mentor junior engineers and share knowledge.
Requirements
- 6+ years of experience in Site Reliability Engineering, DevOps, or related fields.
- Hands-on experience with production-grade cloud systems.
- Expertise with at least one major public cloud provider.
- Strong proficiency in infrastructure-as-code tools like Terraform.
- Experience in CI/CD pipeline design and implementation.
- Deep understanding of Linux/Unix systems and networking.
- Proficiency in scripting languages like Python, Bash, or Go.
- Experience with containerization and orchestration technologies.
- Hands-on experience with monitoring and observability tools.
- Strong problem-solving skills with an engineering-first mindset.
- Experience implementing security best practices in cloud environments.
- Excellent English communication skills.
