Member of Technical Staff - Compute Infrastructure

about 2 months ago

Palo Alto, CA, USA or Seattle, WA, USAMid Level / Senior

H1B Sponsor

Base Salary

$180k - $440k/yr

Responsibilities

Design, build, and optimize massive GPU clusters for extreme-scale training and inference workloads.
Develop and tune low-level CUDA kernels for maximum performance.
Work on Linux kernel internals, scheduling, memory management, and resource isolation.
Build custom container orchestration and virtualization layers beyond standard Kubernetes.
Profile, debug, and eliminate bottlenecks across GPU memory hierarchy and networking fabric.
Create and maintain infrastructure-as-code and automation tools for supercomputer reliability.
Collaborate closely with AI research teams to deliver production-grade performance.

Deep low-level systems programming experience in C/C++ or Rust.
Experience building and operating high-performance exabyte scale storage systems.
Strong experience with large-scale GPU clusters or distributed compute infrastructure.
Hands-on work with GPU kernel optimization and profiling tools.
Experience with Linux kernel internals and large-scale orchestration.
Track record of building or running high-performance infrastructure for AI workloads.
Ability to reason from first principles and optimize for memory-bound and compute-bound scenarios.