GrepJob
Featherless AI

Machine Learning Engineer — Training Optimization

Featherless AI
Apply
4 months ago
Remote, WorldwideMid Level / Senior

Responsibilities

  • Optimize large-scale model training pipelines for throughput, convergence, stability, and cost.
  • Improve distributed training strategies including data, model, and pipeline parallelism.
  • Tune optimizers, schedulers, batch sizing, and precision settings.
  • Reduce training time and compute costs through profiling and bottleneck analysis.
  • Collaborate with researchers on architecture-aware training strategies.
  • Build and maintain robust training infrastructure for checkpointing and fault tolerance.
  • Evaluate and integrate new training techniques and own training performance metrics.

Requirements

  • Strong experience training large neural networks or similarly large models.
  • Hands-on experience with training optimization techniques.
  • Solid understanding of backpropagation, optimization algorithms, and training dynamics.
  • Experience with distributed systems for machine learning training.
  • Proficiency in PyTorch is required.
  • Comfortable working close to hardware constraints like GPUs and memory.

Benefits

  • Real ownership at a Series-A stage company.
  • Opportunity to work on cutting-edge models and training systems at scale.
  • Small, highly technical team with fast feedback loops.
  • Strong emphasis on engineering quality and research rigor.
  • Competitive compensation with meaningful equity.

Tech Stack

PyTorch

Categories

AI & MLData Engineering