GrepJob
Hippocratic AI

LLM Inference Engineer

Hippocratic AI
Apply
8 months ago
Palo Alto, CA, USAMid Level / Senior
H1B Sponsor

Responsibilities

  • Design and implement multi-node serving architectures for distributed LLM inference.
  • Optimize multi-LoRA serving systems.
  • Apply advanced quantization techniques to reduce model footprint while preserving quality.
  • Implement speculative decoding and other latency optimization strategies.
  • Develop disaggregated serving solutions with optimized caching strategies.
  • Continuously benchmark and improve system performance across various deployment scenarios.

Requirements

  • Experience optimizing LLM inference systems at scale.
  • Proven expertise with distributed serving architectures for large language models.
  • Hands-on experience implementing quantization techniques for transformer models.
  • Strong understanding of modern inference optimization methods.
  • Proficiency in Python and C++.
  • Experience with CUDA programming and GPU optimization.

Tech Stack

Categories

AI & MLData Engineering