Software Engineer, Inference – AMD GPU Enablement

6 months ago

San Francisco, CA, USA

Mid Level / Senior

Base Salary

$295k - $555k/yr

Responsibilities

Own the bring-up, correctness, and performance of the OpenAI inference stack on AMD hardware.
Integrate internal model-serving infrastructure into various GPU-backed systems.
Debug and optimize distributed inference workloads across memory, network, and compute layers.
Validate correctness, performance, and scalability of model execution on large GPU clusters.
Collaborate with teams to design and optimize high-performance GPU kernels for accelerators.
Build, integrate, and tune collective communication libraries for parallel model execution.

Experience writing or porting GPU kernels using HIP, CUDA, or Triton.
Familiarity with communication libraries like NCCL/RCCL.
Experience with distributed inference systems and scaling models across fleets of accelerators.
Strong problem-solving skills for end-to-end performance challenges.
Excitement to work in a small, fast-moving team building new infrastructure.