GrepJob
d-Matrix

Principal LLM Inference Engineer

d-Matrix
Apply
2 days ago
Santa Clara, CA, USAStaff+
H1B Sponsor

Base Salary

$195k - $285k/yr

Responsibilities

  • Identify and prototype emerging LLM inference use cases suited to heterogeneous hardware deployments.
  • Build compelling proof-of-concept systems that demonstrate D-Matrix capabilities.
  • Develop and tune custom kernels and operator-level optimizations.
  • Drive quantization, sparsity, and batching strategies tailored to D-Matrix computational model.
  • Build and maintain inference runtimes, serving frameworks, and evaluation tooling.
  • Contribute to distributed inference systems including tensor/pipeline parallelism.
  • Work closely with hardware architects to provide actionable inference workload insights.
  • Partner with product and business development to translate POCs into customer-facing demonstrations.
  • Contribute to technical publications and open-source projects.

Requirements

  • Bachelor’s degree in Computer Science, Electrical Engineering, or a related field with 10+ years of relevant experience.
  • Master’s or PhD in a related field preferred, with 6+ years of industry experience.
  • Strong proficiency in Python and C/C++.
  • Hands-on experience optimizing LLM inference including attention kernels and quantization.
  • Experience with at least one major inference framework at a contributor level.
  • Familiarity with GPU kernel programming and performance profiling tools.

Benefits

  • Work on genuinely novel hardware with unique inference optimization challenges.
  • End-to-end ownership from idea to deployed system with a short feedback loop.
  • Small, senior team with high autonomy and direct influence on product direction.
  • Competitive compensation, equity, and benefits in Santa Clara, CA.

Tech Stack

Categories

AI & MLBackendData Engineering