11 days ago
Dublin, IrelandMid Level / Senior
Responsibilities
- Design and implement scalable, reliable, and fault-tolerant systems across cloud environments.
- Develop and maintain observability tools, including monitoring, logging, and alerting.
- Automate infrastructure provisioning, deployment, and incident response using Infrastructure as Code tools.
- Optimize system performance, scalability, and incident response workflows.
- Conduct root cause analysis and implement preventative measures to minimize failures.
- Ensure high availability by designing and maintaining load balancing and disaster recovery strategies.
- Improve CI/CD pipelines to enhance deployment speed while maintaining stability.
- Participate in on-call rotations to quickly address system failures.
Requirements
- Around 4+ years of experience in Site Reliability Engineering (SRE), DevOps, or System Engineering.
- Strong knowledge of cloud platforms (AWS, Azure, or GCP) and cloud-native architectures.
- Experience with observability and monitoring tools like Prometheus, Grafana, and Datadog.
- Proficiency in Infrastructure as Code tools such as Terraform or CloudFormation.
- Hands-on experience with containerization and orchestration (Docker, Kubernetes).
- Strong Linux system administration and networking fundamentals.
- Experience with incident management and root cause analysis.
- Proficiency in scripting (Bash, Python, or Go) for automation.
- Knowledge of load balancing, failover strategies, and distributed systems.
- Understanding of security best practices and compliance requirements.
Benefits
- Apple hardware ecosystem for work.
- Annual Bonus.
- Top-tier Health and Life Insurance.
- Transportation Budget to support your commute needs.
- Coverflex benefits package for meal allowances and well-being.
- Childcare support.
- Air Conference for team collaboration and growth.
- Pension Fund for long-term financial planning.
- Urban Sports Club membership for fitness activities.
- Meals 100% free at the hub.
