
Head of Inference, Stealth Edge AI Co
Montauk Capital3 days ago
Responsibilities
- Create the inference strategy and define the inference architecture for Edge AI.
- Own the inference serving layer end-to-end using technologies like vLLM and TensorRT-LLM.
- Build a credible proof of concept to demonstrate platform capabilities.
- Drive cost-per-token optimization and optimize GPU utilization.
- Build distributed inference pipelines across multi-GPU, multi-node edge deployments.
- Set performance baselines and SLAs for inference latency and throughput.
- Define the software access layer architecture and oversee integration efforts.
- Engage with investors, partners, and technical stakeholders.
Requirements
- Hands-on experience implementing production inference systems.
- Deep knowledge of model serving and practical engineering for inference.
- Experience with observability tooling and debugging complex distributed systems.
- Proficiency in C++, CUDA, or Rust.
- Expertise in GPU utilization and CUDA kernel optimization.
- Experience with Kubernetes, Ray, and custom load balancing.
- Technical leadership experience in startup environments.
Benefits
- Opportunity to solve the AI inference bottleneck with innovative solutions.
- Access to Montauk Capital's resources and operational expertise.
- Competitive compensation and equity for true ownership of your work.