Staff Engineer DevOps, Agentic AI
Netskope
2 days ago
Delhi, India
Staff+
H1B Sponsor
Responsibilities
- Design and architect scalable, secure cloud environments using Infrastructure as Code (Terraform).
- Implement and manage CI/CD pipelines for reliable deployments across environments.
- Manage release processes including versioning and rollback strategies.
- Provision and manage Kubernetes clusters ensuring high availability.
- Implement auto-scaling strategies for infrastructure and workloads.
- Set up monitoring, logging, and alerting systems for workloads.
- Operate large Kubernetes clusters supporting production workloads.
- Improve software delivery lifecycle reliability and quality.
- Measure and optimize system performance, identifying bottlenecks.
- Provide operational support for large-scale distributed systems.
Requirements
- 8+ years of experience in building and operating core infrastructure systems.
- Strong hands-on experience with Infrastructure as Code tools like Terraform.
- Deep experience with Kubernetes and container orchestration.
- Experience with major cloud providers such as AWS, Google Cloud, or Azure.
- Experience designing and managing CI/CD pipelines using tools like GitHub Actions or Jenkins.
- Strong scripting skills in Python or Bash, with experience in Git workflows.
- Experience with monitoring and observability tools like Prometheus or Grafana.
- Proven track record of building scalable and secure production systems.
- Strong troubleshooting skills in distributed systems and cloud-native architectures.
- Proactive in identifying reliability risks and automation opportunities.
- Comfortable with ambiguity and rapid change in a dynamic environment.
- Familiarity with LLM development and high-performance ML systems is a plus.
Tech Stack
AWSAzureBashGitGitHub ActionsGitLab CI/CDGoogle CloudGrafanaJenkinsKubernetesPrometheusPythonTerraform
Categories
AI & MLDevOps