13 days ago
Palo Alto, CA, USA or Seattle, WA, USAMid Level / Senior
H1B Sponsor
Base Salary
$180k - $440k/yr
Responsibilities
- Design, build, and optimize massive GPU clusters for extreme-scale training and inference workloads.
- Develop and tune low-level CUDA kernels for maximum performance.
- Work on Linux kernel internals, scheduling, memory management, and resource isolation.
- Build custom container orchestration and virtualization layers beyond standard Kubernetes.
- Profile, debug, and eliminate bottlenecks across GPU memory hierarchy and networking fabric.
- Create and maintain infrastructure-as-code and automation tools for supercomputer reliability.
- Collaborate closely with AI research teams to deliver production-grade performance.
Requirements
- Deep low-level systems programming experience in C/C++ or Rust.
- Experience building and operating high-performance exabyte scale storage systems.
- Strong experience with large-scale GPU clusters or distributed compute infrastructure.
- Hands-on work with GPU kernel optimization and profiling tools.
- Experience with Linux kernel internals and large-scale orchestration.
- Track record of building or running high-performance infrastructure for AI workloads.
- Ability to reason from first principles and optimize for memory-bound and compute-bound scenarios.
Benefits
- Equity in the company.
- Comprehensive medical, vision, and dental coverage.
- Access to a 401(k) retirement plan.
- Short and long-term disability insurance.
- Life insurance and various discounts and perks.