about 11 hours ago
Remote, United States or San Francisco, CA, USAMid Level / Senior
H1B Sponsor
Base Salary
$114k - $235k/yr
Responsibilities
- Ensure the reliability, availability, and performance of production infrastructure and platform services.
- Operate and scale Kubernetes platforms, including governance and support for multi-tenant workloads.
- Manage GitOps-based deployment workflows using ArgoCD and Helm.
- Support infrastructure provisioning and change management through Terraform/Terragrunt.
- Build and support CI/CD automation and deployment workflows using GitHub Actions.
- Participate in incident response, root cause analysis, and post-incident improvement initiatives.
- Reduce operational toil through scripting, tooling, and process automation.
- Advance observability practices across logs, metrics, traces, dashboards, and alerting.
- Support secure secrets integration, IAM-aware operations, and platform guardrails.
- Partner closely with application, security, and platform teams to improve reliability and delivery outcomes.
Requirements
- 4+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or Cloud Infrastructure.
- Strong hands-on experience operating AWS in production environments.
- Good expertise in Kubernetes, including cluster operations and troubleshooting.
- Experience with Kubernetes multi-tenancy, including namespaces and RBAC.
- Experience implementing and operating ArgoCD within a GitOps delivery model.
- Strong hands-on experience with Helm.
- Experience with Terraform/Terragrunt for infrastructure provisioning.
- Solid scripting and automation skills using Bash and/or Python.
- Experience building and maintaining CI/CD pipelines, ideally using GitHub Actions.
- Strong troubleshooting skills across Linux, containers, IAM, networking, and distributed systems.
- Experience with monitoring, alerting, and observability in production environments.
- Demonstrated ownership mindset with experience handling incidents.
- Strong collaboration and communication skills.
- Bachelor’s degree in computer science, engineering, or a related field.
- Demonstrated ability to use AI to improve workflow efficiency.
- Strong track record of critical evaluation of AI-assisted work.
- High integrity and ownership in protecting sensitive data.