3 days ago
San Francisco, CA, USA or Sunnyvale, CA, USAStaff+
Base Salary
$208k - $253k/yr
Responsibilities
- Manage and maintain day-to-day operations of Crusoe’s cloud infrastructure.
- Develop automation tools to streamline server provisioning and reduce SLA times.
- Scale infrastructure to support mass deployments (80-100 servers simultaneously).
- Troubleshoot hardware issues, especially with GPUs, and liaise with vendors.
- Transition Crusoe’s environment to Kubernetes and containerized workflows.
Requirements
- Solid hardware experience and GPU troubleshooting expertise.
- Strong Linux background.
- Knowledge of PXE booting and server provisioning (bare metal).
- Experience with BMC/IPMI, BIOS, and enterprise-grade server management.
- Kubernetes proficiency (admin or developer).
- Familiarity with containerization technologies (Docker preferred).
- Experience with version control systems (Gitlab).
- Strong problem-solving skills to analyze complex technical issues.
- Strong communication and collaboration skills.
- Experience with MAAS, Python or Golang, Kubernetes administration, Ansible, and Terraform is a plus.
Benefits
- Industry competitive pay.
- Restricted Stock Units in a fast growing, well-funded technology company.
- Health insurance package options including HDHP and PPO, vision, and dental.
- Employer contributions to HSA accounts.
- Paid Parental Leave.
- Paid life insurance, short-term and long-term disability.
- Teladoc services.
- 401(k) with a 100% match up to 4% of salary.
- Generous paid time off and holiday schedule.
- Cell phone reimbursement.
- Tuition reimbursement.
- Subscription to the Calm app.
- Company paid commuter benefit of $300 per month.
