GrepJob
TELUS Digital

Lead DevOps Engineer

TELUS Digital
Apply
about 4 hours ago
São Paulo, BrazilStaff+
H1B Sponsor

Responsibilities

  • Define platform reliability strategy and establish SLOs/SLIs for AI services.
  • Design scalable and secure cloud architecture on GCP for distributed AI services.
  • Build observability metrics and alerting for LLM-powered features.
  • Implement resilience engineering practices for AI inference paths.
  • Automate infrastructure management using Terraform and other tools.
  • Enforce production readiness standards across teams launching new AI products.
  • Mentor engineers and drive architecture reviews to enhance engineering culture.

Requirements

  • Significant experience in infrastructure engineering combining DevOps and SRE disciplines.
  • Deep expertise in GCP, with relevant cloud certifications preferred.
  • Production experience with SRE fundamentals including SLO/SLI design.
  • Strong background in distributed systems and resilience patterns.
  • Expertise in infrastructure-as-code (Terraform) and container orchestration (Kubernetes).
  • Hands-on experience with modern observability stacks and AI-specific tooling.
  • Proficiency in Python, Javascript, and Bash for infrastructure tooling.