GrepJob
Montauk Capital

Head of Inference, Stealth Edge AI Co

Montauk Capital
Apply
3 days ago

Responsibilities

  • Create the inference strategy and define the inference architecture for Edge AI.
  • Own the inference serving layer end-to-end using technologies like vLLM and TensorRT-LLM.
  • Build a credible proof of concept to demonstrate platform capabilities.
  • Drive cost-per-token optimization and optimize GPU utilization.
  • Build distributed inference pipelines across multi-GPU, multi-node edge deployments.
  • Set performance baselines and SLAs for inference latency and throughput.
  • Define the software access layer architecture and oversee integration efforts.
  • Engage with investors, partners, and technical stakeholders.

Requirements

  • Hands-on experience implementing production inference systems.
  • Deep knowledge of model serving and practical engineering for inference.
  • Experience with observability tooling and debugging complex distributed systems.
  • Proficiency in C++, CUDA, or Rust.
  • Expertise in GPU utilization and CUDA kernel optimization.
  • Experience with Kubernetes, Ray, and custom load balancing.
  • Technical leadership experience in startup environments.

Benefits

  • Opportunity to solve the AI inference bottleneck with innovative solutions.
  • Access to Montauk Capital's resources and operational expertise.
  • Competitive compensation and equity for true ownership of your work.

Tech Stack

C++KubernetesRust

Categories

AI & MLBackendData Engineering