GrepJob
Pinterest

Site Reliability Engineer II, tvScientific

Pinterest
Apply
about 11 hours ago
Remote, United States or San Francisco, CA, USAMid Level / Senior
H1B Sponsor

Base Salary

$114k - $235k/yr

Responsibilities

  • Ensure the reliability, availability, and performance of production infrastructure and platform services.
  • Operate and scale Kubernetes platforms, including governance and support for multi-tenant workloads.
  • Manage GitOps-based deployment workflows using ArgoCD and Helm.
  • Support infrastructure provisioning and change management through Terraform/Terragrunt.
  • Build and support CI/CD automation and deployment workflows using GitHub Actions.
  • Participate in incident response, root cause analysis, and post-incident improvement initiatives.
  • Reduce operational toil through scripting, tooling, and process automation.
  • Advance observability practices across logs, metrics, traces, dashboards, and alerting.
  • Support secure secrets integration, IAM-aware operations, and platform guardrails.
  • Partner closely with application, security, and platform teams to improve reliability and delivery outcomes.

Requirements

  • 4+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or Cloud Infrastructure.
  • Strong hands-on experience operating AWS in production environments.
  • Good expertise in Kubernetes, including cluster operations and troubleshooting.
  • Experience with Kubernetes multi-tenancy, including namespaces and RBAC.
  • Experience implementing and operating ArgoCD within a GitOps delivery model.
  • Strong hands-on experience with Helm.
  • Experience with Terraform/Terragrunt for infrastructure provisioning.
  • Solid scripting and automation skills using Bash and/or Python.
  • Experience building and maintaining CI/CD pipelines, ideally using GitHub Actions.
  • Strong troubleshooting skills across Linux, containers, IAM, networking, and distributed systems.
  • Experience with monitoring, alerting, and observability in production environments.
  • Demonstrated ownership mindset with experience handling incidents.
  • Strong collaboration and communication skills.
  • Bachelor’s degree in computer science, engineering, or a related field.
  • Demonstrated ability to use AI to improve workflow efficiency.
  • Strong track record of critical evaluation of AI-assisted work.
  • High integrity and ownership in protecting sensitive data.

Tech Stack

AWSBashGitHub ActionsHelmKubernetesLinuxPythonTerraform

Categories