over 2 years ago
Remote, Worldwide or San Francisco, CA, USAMid Level / Senior
H1B Sponsor
Base Salary
$200k - $550k/yr
Responsibilities
- Design and operate large-scale GPU clusters for training and inference.
- Build and maintain infrastructure using Terraform across cloud and hybrid environments.
- Deploy, operate, and optimize Kubernetes clusters for AI workloads.
- Develop modular, scalable infrastructure-as-code patterns for provisioning.
- Improve deployment reproducibility and operational safety.
- Optimize networking and storage systems for high-throughput AI workloads.
- Automate fault detection and recovery across distributed clusters.
- Debug complex cross-layer issues spanning hardware and software.
Requirements
- Strong systems engineering fundamentals.
- Deep experience with Terraform, including module design and large-scale deployments.
- Experience operating production GPU infrastructure or high-performance distributed systems.
- Strong understanding of networking and storage systems.
- Experience with major cloud platforms like GCP, AWS, Azure, or OCI.
- Track record of owning production-critical infrastructure end-to-end.
Benefits
- Annual salary range between $200K - $550K depending on experience.
- Equity is a significant part of total compensation.
- 401(k) plan with 6% salary matching.
- Generous health, dental, and vision insurance for you and your dependents.
- Unlimited paid time off.
- Visa sponsorship and relocation stipend available.
