Staff+ Software Engineer, Inference Runtime

about 2 months ago

Remote, Worldwide +3 moreStaff+

H1B Sponsor

Base Salary

$405k - $485k/yr

Responsibilities

Set technical direction for the team, owning the architecture and roadmap for the shared runtime of the inference serving stack.
Own and evolve the accelerator-agnostic runtime, including hands-on work in a performance-sensitive Rust and Python codebase.
Ensure new models and deployment targets pay only for their own specialization, keeping expansion costs low.
Drive efficient accelerator usage across GPU, TPU, and Trainium.
Build the runtime's validation surface around partitioned builds and change-scoped testing.
Act as a technical counterpart to the central Infrastructure org on compilers and build systems.
Mentor engineers through design and code reviews, raising the technical bar.

Deep background in systems engineering or ML infrastructure with hands-on experience in performance profiling and optimization.
Real depth in at least one accelerator ecosystem (CUDA/GPU, TPU, or Trainium/AWS Neuron).
Significant software engineering experience in high-performance, large-scale distributed systems.
Track record of defining and using engineering metrics to drive improvement.
Experience driving technical alignment across organizational boundaries.
Strong written and verbal communication skills.