5 months ago
Remote, Worldwide +2 moreSenior / Staff+
H1B Sponsor
Responsibilities
- Implement and optimize inference kernels for various edge hardware architectures.
- Develop quantization strategies to maximize compression while preserving model quality.
- Contribute to open-source inference frameworks like llama.cpp.
- Profile and optimize end-to-end inference pipelines for low latency.
- Collaborate with ML researchers to identify optimization opportunities.
Requirements
- 5+ years of experience in systems programming with strong C++ proficiency.
- Experience in embedded software engineering or resource-constrained systems.
- Understanding of ML fundamentals at the linear algebra level.
- Familiarity with hardware architecture concepts like cache hierarchies and memory bandwidth.
- Experience with contributions to inference frameworks is a plus.
Benefits
- Competitive base salary with equity in a unicorn-stage company.
- 100% coverage of medical, dental, and vision premiums for employees and dependents.
- 401(k) matching up to 4% of base pay.
- Unlimited PTO plus company-wide Refill Days throughout the year.
