Site Reliability Engineer - Platform Engineering

3 months ago

Bengaluru, IndiaSenior

H1B Sponsor

Responsibilities

Design, implement, and maintain scalable infrastructure on Google Cloud Platform.
Develop, own, and operate critical platform services.
Build and maintain Infrastructure as Code using Terraform-Terragrunt.
Establish and maintain SLI/SLO frameworks for critical services.
Implement monitoring, alerting, observability, and incident management solutions.
Conduct incident response and root cause analysis.
Optimize application and infrastructure performance and cost.
Design and implement chaos engineering practices.
Develop self-service platforms and tooling for engineering teams.
Automate operational tasks including scaling and security patching.
Create and maintain infrastructure APIs and abstractions.
Integrate security best practices into infrastructure and platform services.
Implement security monitoring and compliance reporting.
Design secure network architectures and establish disaster recovery procedures.

6-8 years of experience in Site Reliability Engineering, Platform Engineering, or DevOps roles.
Proven track record of managing production systems at scale.
Strong background with cloud platforms, particularly GCP or AWS.
Experience in containerization and orchestration platforms like Kubernetes and Docker.
Proficiency in Node.js and TypeScript for building automation tools.
Advanced experience with Terraform for infrastructure management.
Hands-on experience with monitoring platforms like Datadog.
Strong Linux/Unix systems skills.
Knowledge of security principles for cloud infrastructure.
Familiarity with CI/CD tools and practices.