about 13 hours ago
Santa Clara, CA, USA
Staff+
H1B Sponsor
Base Salary
$119k - $243k/yr
Responsibilities
- Design and architect scalable, secure cloud environments using Infrastructure as Code (Terraform).
- Implement and manage CI/CD pipelines for reliable deployments across environments.
- Manage release processes including versioning and deployment strategies.
- Provision and manage Kubernetes clusters ensuring high availability.
- Implement auto-scaling strategies for infrastructure and workloads.
- Set up monitoring, logging, and alerting systems for workloads.
- Oversee large Kubernetes clusters supporting production workloads.
- Improve software delivery lifecycle reliability and quality.
- Measure and optimize system performance, identifying bottlenecks.
- Provide operational support for large-scale distributed systems.
Requirements
- 10+ years of experience in building and operating core infrastructure systems.
- Strong hands-on experience with Infrastructure as Code tools like Terraform.
- Deep experience with Kubernetes and container orchestration.
- Experience with major cloud providers such as AWS, Google Cloud, or Azure.
- Experience designing and managing CI/CD pipelines using tools like GitHub Actions or Jenkins.
- Strong scripting skills in Python or Bash, and familiarity with Git workflows.
- Experience with monitoring and observability tools like Prometheus or Grafana.
- Proven track record of building scalable and secure production systems.
- Strong troubleshooting skills in distributed systems and cloud-native architectures.
- Proactive in identifying reliability risks and automation opportunities.
- Comfortable with ambiguity and rapid change in a dynamic environment.
- Familiarity with LLM development and high-performance ML systems is a plus.
Tech Stack
AWSAzureBashGitGitHub ActionsGitLab CI/CDGoogle CloudGrafanaJenkinsKubernetesPrometheusPythonTerraform
Categories
AI & MLDevOps