GrepJob
xAI

Member of Technical Staff - Inference

xAI
Apply
13 days ago
Palo Alto, CA, USAMid Level / Staff+
H1B Sponsor

Base Salary

$180k - $440k/yr

Responsibilities

  • Architect and implement scalable distributed infrastructure for model serving.
  • Optimize latency and throughput of model inference under real production workloads.
  • Build reliable, high-concurrency serving systems with 100% uptime and 0% error rate.
  • Benchmark, fine-tune, and accelerate inference engines, including GPU kernel work.
  • Develop custom tools to trace, replay, and fix issues across the full stack.
  • Create robust CI/CD infrastructure for seamless endpoint deployment and updates.
  • Accelerate research on scaling test-time compute and model-hardware co-design.

Requirements

  • Deep low-level systems programming experience in C/C++ or Rust.
  • Experience with large-scale, high-concurrent production serving.
  • Familiarity with GPU inference engines like vLLM, SGLang, and TensorRT-LLM.
  • Strong background in system optimizations such as batching and caching.
  • Experience with low-level inference optimizations including GPU kernels.
  • Knowledge of algorithmic inference optimizations like quantization and distillation.
  • Experience in testing, benchmarking, and ensuring reliability of inference services.
  • Experience designing and implementing CI/CD infrastructure for inference.

Benefits

  • Equity in the company.
  • Comprehensive medical, vision, and dental coverage.
  • Access to a 401(k) retirement plan.
  • Short and long-term disability insurance.
  • Life insurance and various discounts and perks.

Tech Stack

CC++Rust

Categories

AI & MLBackendData EngineeringTesting