about 3 hours ago
Base Salary
$405k - $485k/yr
Responsibilities
- Set technical direction for the team, owning the architecture and roadmap for the shared runtime of the inference serving stack.
- Own and evolve the accelerator-agnostic runtime, including hands-on work in a performance-sensitive Rust and Python codebase.
- Ensure new models and deployment targets pay only for their own specialization, keeping expansion costs low.
- Drive efficient accelerator usage across GPU, TPU, and Trainium.
- Build the runtime's validation surface around partitioned builds and change-scoped testing.
- Act as a technical counterpart to the central Infrastructure org on compilers and build systems.
- Mentor engineers through design and code reviews, raising the technical bar.
Requirements
- Deep background in systems engineering or ML infrastructure with hands-on experience in performance profiling and optimization.
- Real depth in at least one accelerator ecosystem (CUDA/GPU, TPU, or Trainium/AWS Neuron).
- Significant software engineering experience in high-performance, large-scale distributed systems.
- Track record of defining and using engineering metrics to drive improvement.
- Experience driving technical alignment across organizational boundaries.
- Strong written and verbal communication skills.
Benefits
- Competitive compensation and benefits.
- Optional equity donation matching.
- Generous vacation and parental leave.
- Flexible working hours.
- Collaborative office space.