Senior Software Engineer - Devops
Roku
about 1 month ago
Bengaluru, India
Senior / Staff+
H1B Sponsor
Responsibilities
- Write clean, maintainable Python for services, automation, and infrastructure tooling.
- Design, migrate and operate Kubernetes clusters in production.
- Lead cluster upgrades, workload migrations, autoscaling and capacity planning.
- Implement safe deployment strategies (rolling, canary, blue/green).
- Manage Infra as Code (Terraform or equivalent) and fully checked into version control.
- Operate multi-environment (dev/stage/prod) and multi-region setups on AWS and/or GCP.
- Build and maintain CI/CD pipelines for large monorepos.
- Support deployments for web applications, background workers, and ETL and batch pipelines.
- Improve release safety, rollback mechanisms, and developer velocity.
- Design and maintain telemetry across services using metrics, logs, and traces.
- Set up PagerDuty alerts, on-call workflows, and incident response processes.
- Define and track SLIs, SLOs, and service health indicators.
- Support GenAI-powered services used in Ads automation.
- Implement observability for LLM systems, including latency, throughput, error rates, and infrastructure-level reliability.
Requirements
- 8+ years of experience in SRE, Platform Engineering, Infrastructure or Backend Engineering roles.
- Strong Python proficiency for services, pipelines, and automation.
- Hands-on production experience with Kubernetes.
- Experience working with AWS or GCP.
- Familiarity with Linux systems.
- Knowledge of networking (DNS, VPCs, routing).
- Experience with distributed systems.
- Experience with CI/CD pipelines, monorepos, and infrastructure-as-code.
- Experience building or operating observability and alerting systems.
- Experience with Airflow or large-scale ETL/data systems.
- Experience supporting GenAI / ML infrastructure in production.
- Prior on-call ownership for production systems.
Benefits
- Comprehensive benefits including healthcare, life, accident, disability, and retirement options.
- Global access to mental health and financial wellness support.
- Flexible work arrangements with a hybrid work approach.
- Time off for vacation and personal reasons.
Tech Stack
Apache AirflowAWSGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonTerraform
Categories
AI & MLBackendData EngineeringDevOps