Machine Learning Engineer — Inference Optimization

4 months ago

Remote, WorldwideMid Level / Senior

Responsibilities

Optimize inference latency, throughput, and cost for large-scale ML models in production.
Profile and bottleneck GPU/CPU inference pipelines.
Implement and tune techniques such as quantization and KV-cache optimization.
Collaborate with research engineers to productionize new model architectures.
Build and maintain inference-serving systems.
Benchmark performance across hardware and cloud setups.
Improve system reliability, observability, and cost efficiency.

PyTorch