11 months ago
Remote, Worldwide +2 moreMid Level / Senior
H1B Sponsor
Responsibilities
- Write high-performance GPU kernels for novel model architectures.
- Integrate kernels into PyTorch pipelines including custom ops and benchmarking.
- Profile and optimize training and inference workflows to eliminate bottlenecks.
- Build correctness tests and numerics checks.
- Maintain performance benchmarks and guardrails to prevent regressions.
- Collaborate closely with researchers to implement speedup techniques.
Requirements
- Must have authored custom CUDA kernels beyond using cuDNN/cuBLAS.
- Strong understanding of GPU architecture and performance metrics.
- Proficiency with low-level profiling tools like Nsight Systems/Compute.
- Strong C/C++ programming skills.
- Nice-to-have experience with CUTLASS and tensor core utilization strategies.
- Experience with Triton kernel and PyTorch custom op integration is a plus.
Benefits
- Unique optimization challenges with high ownership from day one.
- Competitive base salary with equity in a unicorn-stage company.
- 100% coverage of medical, dental, and vision premiums for employees and dependents.
- 401(k) matching up to 4% of base pay.
- Unlimited PTO plus company-wide Refill Days throughout the year.
