GrepJob
Magic

Member of Technical Staff, Supercomputing Platform & Infrastructure

Magic
Apply
over 2 years ago
Remote, Worldwide or San Francisco, CA, USAMid Level / Senior
H1B Sponsor

Base Salary

$200k - $550k/yr

Responsibilities

  • Design and operate large-scale GPU clusters for training and inference.
  • Build and maintain infrastructure using Terraform across cloud and hybrid environments.
  • Deploy, operate, and optimize Kubernetes clusters for AI workloads.
  • Develop modular, scalable infrastructure-as-code patterns for provisioning.
  • Improve deployment reproducibility and operational safety.
  • Optimize networking and storage systems for high-throughput AI workloads.
  • Automate fault detection and recovery across distributed clusters.
  • Debug complex cross-layer issues spanning hardware and software.

Requirements

  • Strong systems engineering fundamentals.
  • Deep experience with Terraform, including module design and large-scale deployments.
  • Experience operating production GPU infrastructure or high-performance distributed systems.
  • Strong understanding of networking and storage systems.
  • Experience with major cloud platforms like GCP, AWS, Azure, or OCI.
  • Track record of owning production-critical infrastructure end-to-end.

Benefits

  • Annual salary range between $200K - $550K depending on experience.
  • Equity is a significant part of total compensation.
  • 401(k) plan with 6% salary matching.
  • Generous health, dental, and vision insurance for you and your dependents.
  • Unlimited paid time off.
  • Visa sponsorship and relocation stipend available.