Roku

Senior Software Engineer - Devops

Roku

Apply
about 1 month ago
Bengaluru, India
Senior / Staff+
H1B Sponsor

Responsibilities

  • Write clean, maintainable Python for services, automation, and infrastructure tooling.
  • Design, migrate and operate Kubernetes clusters in production.
  • Lead cluster upgrades, workload migrations, autoscaling and capacity planning.
  • Implement safe deployment strategies (rolling, canary, blue/green).
  • Manage Infra as Code (Terraform or equivalent) and fully checked into version control.
  • Operate multi-environment (dev/stage/prod) and multi-region setups on AWS and/or GCP.
  • Build and maintain CI/CD pipelines for large monorepos.
  • Support deployments for web applications, background workers, and ETL and batch pipelines.
  • Improve release safety, rollback mechanisms, and developer velocity.
  • Design and maintain telemetry across services using metrics, logs, and traces.
  • Set up PagerDuty alerts, on-call workflows, and incident response processes.
  • Define and track SLIs, SLOs, and service health indicators.
  • Support GenAI-powered services used in Ads automation.
  • Implement observability for LLM systems, including latency, throughput, error rates, and infrastructure-level reliability.

Requirements

  • 8+ years of experience in SRE, Platform Engineering, Infrastructure or Backend Engineering roles.
  • Strong Python proficiency for services, pipelines, and automation.
  • Hands-on production experience with Kubernetes.
  • Experience working with AWS or GCP.
  • Familiarity with Linux systems.
  • Knowledge of networking (DNS, VPCs, routing).
  • Experience with distributed systems.
  • Experience with CI/CD pipelines, monorepos, and infrastructure-as-code.
  • Experience building or operating observability and alerting systems.
  • Experience with Airflow or large-scale ETL/data systems.
  • Experience supporting GenAI / ML infrastructure in production.
  • Prior on-call ownership for production systems.

Benefits

  • Comprehensive benefits including healthcare, life, accident, disability, and retirement options.
  • Global access to mental health and financial wellness support.
  • Flexible work arrangements with a hybrid work approach.
  • Time off for vacation and personal reasons.

Tech Stack

Apache AirflowAWSGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonTerraform

Categories

AI & MLBackendData EngineeringDevOps