Site Reliability Engineer II, tvScientific

about 2 months ago

Remote, United States or San Francisco, CA, USAMid Level / Senior

H1B Sponsor

Base Salary

$114k - $235k/yr

Responsibilities

Ensure the reliability, availability, and performance of production infrastructure and platform services.
Operate and scale Kubernetes platforms, including governance and support for multi-tenant workloads.
Manage GitOps-based deployment workflows using ArgoCD and Helm.
Support infrastructure provisioning and change management through Terraform/Terragrunt.
Build and support CI/CD automation and deployment workflows using GitHub Actions.
Participate in incident response, root cause analysis, and post-incident improvement initiatives.
Reduce operational toil through scripting, tooling, and process automation.
Advance observability practices across logs, metrics, traces, dashboards, and alerting.
Support secure secrets integration, IAM-aware operations, and platform guardrails.
Partner closely with application, security, and platform teams to improve reliability and delivery outcomes.

4+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or Cloud Infrastructure.
Strong hands-on experience operating AWS in production environments.
Good expertise in Kubernetes, including cluster operations and troubleshooting.
Experience with Kubernetes multi-tenancy, including namespaces and RBAC.
Experience implementing and operating ArgoCD within a GitOps delivery model.
Strong hands-on experience with Helm.
Experience with Terraform/Terragrunt for infrastructure provisioning.
Solid scripting and automation skills using Bash and/or Python.
Experience building and maintaining CI/CD pipelines, ideally using GitHub Actions.
Strong troubleshooting skills across Linux, containers, IAM, networking, and distributed systems.
Experience with monitoring, alerting, and observability in production environments.
Demonstrated ownership mindset with experience handling incidents.
Strong collaboration and communication skills.
Bachelor’s degree in computer science, engineering, or a related field.
Demonstrated ability to use AI to improve workflow efficiency.
Strong track record of critical evaluation of AI-assisted work.
High integrity and ownership in protecting sensitive data.