GrepJob
Roku

Senior Software Engineer - SRE

Roku
Apply
about 3 hours ago
Bengaluru, India
Senior
H1B Sponsor

Responsibilities

  • Facilitate blameless post-incident reviews to identify root causes and prioritize reliability improvements.
  • Implement chaos engineering practices to validate system resilience and recovery procedures.
  • Establish core SRE principles and frameworks across the organization.
  • Manage error budgets to balance feature velocity with system reliability.
  • Automate repetitive operational tasks to reduce toil.
  • Implement capacity planning processes to ensure systems meet SLOs.
  • Build observability systems for deep visibility into service health and performance.
  • Create SRE dashboards for real-time visibility into reliability metrics.
  • Partner with development teams to implement reliability from the design phase.
  • Drive continuous improvement through SRE feedback loops and documentation.

Requirements

  • 8+ years of experience in DevOps/SRE roles with expertise in SRE principles.
  • Deep experience with observability and monitoring platforms like Prometheus and Grafana.
  • Strong background in incident management and conducting blameless postmortems.
  • Understanding of distributed systems and reliability engineering concepts.
  • Experience with Kubernetes, Docker, and service mesh technologies.
  • Proficiency in cloud-focused software development, preferably in Go or Python.
  • Experience with Infrastructure as Code tools like Terraform or Ansible.
  • Hands-on experience with cloud platforms such as AWS, GCP, or Azure.
  • Ability to communicate effectively with technical and non-technical stakeholders.
  • BS Degree in Computer Science or equivalent.

Benefits

  • Comprehensive benefits including healthcare, life, and retirement options.
  • Global access to mental health and financial wellness support.
  • Flexible work arrangements with a hybrid work approach.
  • Time off for vacation and personal reasons.

Tech Stack

AmbassadorAnsibleAWSAzureDatadogDockerGoGoogle Cloud PlatformGrafanaIstioKubernetesPrometheusPythonTerraform

Categories

BackendDevOps