GrepJob
Inference

Senior Software Engineer - Model Performance

Inference
Apply
5 months ago
San Francisco, CA, USASenior / Mid Level
H1B Sponsor

Base Salary

$220k - $320k/yr

Responsibilities

  • Implement and productionize optimization techniques including quantization and continuous batching.
  • Deep dive into inference frameworks to debug and improve performance.
  • Profile and optimize CUDA kernels and GPU utilization.
  • Add support for new model architectures ensuring performance standards.
  • Experiment with novel inference techniques and bring successful approaches into production.
  • Build tooling and benchmarks to measure inference performance.
  • Collaborate with applied ML engineers for efficient model serving.

Requirements

  • 2+ years of experience in ML systems, inference optimization, or GPU programming.
  • Strong proficiency in Python and familiarity with C++.
  • Hands-on experience with LLM inference frameworks.
  • Deep understanding of GPU architecture and experience profiling GPU workloads.
  • Familiarity with LLM optimization techniques.
  • Experience with PyTorch and understanding of model execution on hardware.
  • Track record of measurably improving system performance.

Benefits

  • Competitive compensation and equity in a high-growth startup.
  • Comprehensive benefits package.

Categories

AI & MLBackendData Engineering