
Machine Learning Engineer — Inference Optimization
Featherless AI4 months ago
Remote, WorldwideMid Level / Senior
Responsibilities
- Optimize inference latency, throughput, and cost for large-scale ML models in production.
- Profile and bottleneck GPU/CPU inference pipelines.
- Implement and tune techniques such as quantization and KV-cache optimization.
- Collaborate with research engineers to productionize new model architectures.
- Build and maintain inference-serving systems.
- Benchmark performance across hardware and cloud setups.
- Improve system reliability, observability, and cost efficiency.
Requirements
- Strong experience in ML inference optimization or high-performance ML systems.
- Solid understanding of deep learning internals.
- Hands-on experience with PyTorch or similar frameworks.
- Familiarity with GPU performance tuning and optimizations.
- Experience scaling inference for real users.
- Comfortable working in fast-moving startup environments.
Benefits
- Real ownership over performance-critical systems.
- Direct impact on product reliability and unit economics.
- Close collaboration with research, infra, and product teams.
- Competitive compensation and meaningful equity at Series A.
- A team that values engineering quality.
Tech Stack
PyTorch
Categories
AI & MLData Engineering