GrepJob
AlphaSense

Cloud Reliability & Recovery Engineer

AlphaSense
Apply
6 days ago
Remote, IndiaSenior / Mid Level
H1B Sponsor

Responsibilities

  • Design and implement multi-region, multi-AZ AWS architectures that meet RTO/RPO targets.
  • Engineer active-active and active-passive failover patterns using Route 53, Global Accelerator, and CloudFront.
  • Build automated DR runbooks and playbooks using AWS Systems Manager Automation and Step Functions.
  • Implement chaos engineering practices using AWS Fault Injection Simulator (FIS) to validate resiliency.
  • Architect cross-region replication strategies for S3, DynamoDB Global Tables, RDS, and Aurora Global.
  • Review containerized workloads using Kubernetes for resilience.
  • Administer AWS Backup across all services with policy-based automation.
  • Design immutable backup vaults and cross-account/cross-region backup replication pipelines.
  • Develop and automate data recovery testing procedures.
  • Implement point-in-time recovery (PITR) for databases and storage.
  • Maintain Business Continuity Plans (BCP) and Disaster Recovery (DR) strategies.
  • Author and maintain Terraform/CloudFormation templates for BCP/DR infrastructure.
  • Automate DR testing pipelines through CI/CD.
  • Write scripts to orchestrate failover, failback, and health-check workflows.
  • Build CloudWatch dashboards and alarms for availability and DR-readiness indicators.
  • Participate in on-call rotations and lead DR incident response.
  • Conduct regular BCP/DR tabletop exercises and full failover simulations.
  • Ensure DR controls meet compliance requirements.
  • Maintain current and accurate DR documentation.

Requirements

  • 5+ years in cloud infrastructure, SRE, or IT disaster recovery engineering roles.
  • 3+ years of hands-on AWS experience in production environments at scale.
  • Proven delivery of multi-region DR architectures with defined and tested RTO/RPO targets.
  • Expert-level proficiency with core AWS resilience services.
  • Strong scripting skills: Python, Bash, or PowerShell for automation.
  • Experience with Infrastructure as Code: Terraform and/or AWS CloudFormation.
  • Solid understanding of networking fundamentals: VPC, TGW, Direct Connect, VPN, DNS failover.
  • Excellent written and verbal communication skills.

Tech Stack

Amazon DynamoDBAWSBashGitHub ActionsKubernetesPowerShellPythonTerraform

Categories