about 4 hours ago
Toronto, Canada
Senior
H1B Sponsor
Responsibilities
- Drive initiatives to implement and enforce best practices for data streaming, processing, analytics, and monitoring infrastructure.
- Deploy and manage services on Kubernetes-based platforms such as Amazon EKS and Google Kubernetes Engine (GKE).
- Provision and manage cloud infrastructure using Terraform, ensuring best practices in security, scalability, and cost-efficiency.
- Maintain and optimize CI/CD pipelines using Jenkins, ArgoCD, and GitHub Enterprise Actions.
- Work with cloud-native data services such as AWS Kinesis, AWS Glue, Google Dataflow, and Google Pub/Sub.
- Develop and maintain automation scripts and tooling using Python to support DevOps processes.
- Monitor system performance, troubleshoot issues, and implement proactive solutions.
- Implement SRE practices to improve service reliability, scalability, and cost-effectiveness.
- Analyze and optimize cloud costs, identifying areas for improvement.
- Ensure compliance with security policies and best practices in cloud environments.
- Collaborate with cross-functional teams to improve development workflows and infrastructure.
Requirements
- 7+ years of experience in a DevOps, Site Reliability Engineering, or Cloud Infrastructure role.
- Strong experience with AWS and GCP data services, including Kinesis, Glue, Pub/Sub, and Dataflow.
- Proficiency in deploying and managing workloads on Kubernetes (EKS/GKE) in production environments.
- Hands-on experience with Infrastructure-as-Code (IaC) using Terraform.
- Expertise in CI/CD pipeline management using Jenkins, ArgoCD, and GitHub Enterprise Actions.
- Programming skills in Python for automation and scripting.
- Experience with observability and monitoring tools (e.g., Prometheus, Grafana, Datadog, or CloudWatch).
- Strong understanding of SRE principles, including performance monitoring and incident response.
- Experience with cost optimization strategies for cloud infrastructure.
- Self-motivated and driven, with a strong ability to influence and drive changes across multiple teams.
- Ability to work collaboratively in an agile environment and support multiple teams.
Tech Stack
Apache AirflowApache FlinkApache KafkaApache SparkAWSDatadogGitHub ActionsGoogle BigQueryGoogle Cloud PlatformGrafanaIstioJenkinsKubernetesPrometheusPythonRabbitMQSnowflakeTerraform
Categories
DevOps