Head of Inference, Stealth Edge AI Co

3 days ago

Responsibilities

Create the inference strategy and define the inference architecture for Edge AI.
Own the inference serving layer end-to-end using technologies like vLLM and TensorRT-LLM.
Build a credible proof of concept to demonstrate platform capabilities.
Drive cost-per-token optimization and optimize GPU utilization.
Build distributed inference pipelines across multi-GPU, multi-node edge deployments.
Set performance baselines and SLAs for inference latency and throughput.
Define the software access layer architecture and oversee integration efforts.
Engage with investors, partners, and technical stakeholders.

Hands-on experience implementing production inference systems.
Deep knowledge of model serving and practical engineering for inference.
Experience with observability tooling and debugging complex distributed systems.
Proficiency in C++, CUDA, or Rust.
Expertise in GPU utilization and CUDA kernel optimization.
Experience with Kubernetes, Ray, and custom load balancing.
Technical leadership experience in startup environments.

C++KubernetesRust