Senior Software Engineer - Model Performance

5 months ago

San Francisco, CA, USASenior / Mid Level

H1B Sponsor

Base Salary

$220k - $320k/yr

Responsibilities

Implement and productionize optimization techniques including quantization and continuous batching.
Deep dive into inference frameworks to debug and improve performance.
Profile and optimize CUDA kernels and GPU utilization.
Add support for new model architectures ensuring performance standards.
Experiment with novel inference techniques and bring successful approaches into production.
Build tooling and benchmarks to measure inference performance.
Collaborate with applied ML engineers for efficient model serving.

Requirements

2+ years of experience in ML systems, inference optimization, or GPU programming.
Strong proficiency in Python and familiarity with C++.
Hands-on experience with LLM inference frameworks.
Deep understanding of GPU architecture and experience profiling GPU workloads.
Familiarity with LLM optimization techniques.
Experience with PyTorch and understanding of model execution on hardware.
Track record of measurably improving system performance.

Benefits

Competitive compensation and equity in a high-growth startup.
Comprehensive benefits package.

Tech Stack

C++Docker Kubernetes Python PyTorch

Categories

AI & ML BackendData Engineering