5 months ago
Base Salary
$220k - $320k/yr
Responsibilities
- Implement and productionize optimization techniques including quantization and continuous batching.
- Deep dive into inference frameworks to debug and improve performance.
- Profile and optimize CUDA kernels and GPU utilization.
- Add support for new model architectures ensuring performance standards.
- Experiment with novel inference techniques and bring successful approaches into production.
- Build tooling and benchmarks to measure inference performance.
- Collaborate with applied ML engineers for efficient model serving.
Requirements
- 2+ years of experience in ML systems, inference optimization, or GPU programming.
- Strong proficiency in Python and familiarity with C++.
- Hands-on experience with LLM inference frameworks.
- Deep understanding of GPU architecture and experience profiling GPU workloads.
- Familiarity with LLM optimization techniques.
- Experience with PyTorch and understanding of model execution on hardware.
- Track record of measurably improving system performance.
Benefits
- Competitive compensation and equity in a high-growth startup.
- Comprehensive benefits package.
