4 days ago
Remote, United States
Senior
H1B Sponsor
Base Salary
$82k - $229k/yr
Responsibilities
- Operate and support enterprise compute platforms across hardware, OS, virtualization, and container orchestration layers.
- Deploy and maintain bare metal server infrastructure for Ubuntu OS with Kubernetes and hypervisors including OpenStack and Harvester.
- Implement and maintain PXE-based provisioning environments leveraging Redfish APIs for large-scale server deployments.
- Install, patch, and maintain operating systems including Ubuntu and Harvester.
- Operate and support virtualization and private cloud platforms, including KVM on Ubuntu, OpenStack environments, and Harvester HCI.
- Develop Infrastructure-as-Code using Ansible, Terraform, Helm, and Git, with Python/Bash automation.
- Implement CI/CD pipelines for infrastructure updates, patching, upgrades, testing, and rollback.
- Monitor system performance, capacity, and availability; proactively address reliability risks.
- Troubleshoot complex cross-stack issues spanning hardware, OS, virtualization, OpenStack, and Kubernetes.
- Manage to SLAs, KPIs, and error budgets.
- Participate in on-call escalation support for complex platform-related issues.
- Collaborate globally on change management, documentation, and operational best practices.
Requirements
- 6+ years of experience as a DevOps Engineer, Site Reliability Engineer, or Infrastructure Operations Engineer with a strong focus on compute.
- Strong hands-on experience operating bare metal compute environments at scale.
- Experience with PXE boot, automated OS provisioning, and server imaging systems.
- Practical experience supporting Bare Metal as a Service (BMaaS) platforms leveraging Redfish APIs.
- Strong Linux administration skills, especially with Ubuntu.
- Operational experience with virtualization and private cloud platforms, including KVM on Ubuntu, OpenStack operations, and troubleshooting.
- Experience deploying and operating production Kubernetes environments.
- Expertise with enterprise compute hardware, including Cisco UCS, Dell PowerEdge, Supermicro systems, and HPE.
- Proficiency with Infrastructure as Code tools (e.g., Terraform, Ansible, or similar).
- Experience building or supporting CI/CD pipelines for infrastructure and platform automation.
- Strong scripting skills in Python, Bash, or similar languages.
- Strong understanding of SRE functions like toil reduction, error budgets, and meeting SLAs.
- Proven troubleshooting and root cause analysis skills in complex distributed systems.
- Excellent written and verbal communication skills.
- Bachelor’s degree in computer science or equivalent professional experience.
Benefits
- Health, dental, and vision coverage, beginning on the first day of employment.
- Access to an innovative mental health support platform.
- Generous employee stock purchase plan.
- Paid Time Off, company paid holidays, paid volunteer hours, and 12 weeks paid parental leave.
Tech Stack
AnsibleBashGitHelmKubernetesLinuxOpenStackPythonTerraform
Categories
DevOps