GrepJob
Featherless AI

Machine Learning Engineer — Inference Optimization

Featherless AI
Apply
4 months ago
Remote, WorldwideMid Level / Senior

Responsibilities

  • Optimize inference latency, throughput, and cost for large-scale ML models in production.
  • Profile and bottleneck GPU/CPU inference pipelines.
  • Implement and tune techniques such as quantization and KV-cache optimization.
  • Collaborate with research engineers to productionize new model architectures.
  • Build and maintain inference-serving systems.
  • Benchmark performance across hardware and cloud setups.
  • Improve system reliability, observability, and cost efficiency.

Requirements

  • Strong experience in ML inference optimization or high-performance ML systems.
  • Solid understanding of deep learning internals.
  • Hands-on experience with PyTorch or similar frameworks.
  • Familiarity with GPU performance tuning and optimizations.
  • Experience scaling inference for real users.
  • Comfortable working in fast-moving startup environments.

Benefits

  • Real ownership over performance-critical systems.
  • Direct impact on product reliability and unit economics.
  • Close collaboration with research, infra, and product teams.
  • Competitive compensation and meaningful equity at Series A.
  • A team that values engineering quality.

Tech Stack

PyTorch

Categories

AI & MLData Engineering