GrepJob
Decagon

Senior Software Engineer, ML Infrastructure

Decagon
Apply
about 1 month ago

Base Salary

$200k - $400k/yr

Responsibilities

  • Design and build distributed training platforms for LLM and multimodal fine-tuning.
  • Integrate state-of-the-art training algorithms into production pipelines.
  • Own inference architecture and multi-provider routing, including failover and optimization.
  • Lead initiatives to improve latency and cost efficiency across the training and serving stack.
  • Build evaluation and experimentation infrastructure for rapid iteration.
  • Drive technical direction, mentor engineers, and establish best practices for ML infrastructure.

Requirements

  • 6+ years building ML infrastructure or production systems at scale.
  • Deep experience with distributed training, including multi-node GPU clusters.
  • Strong understanding of LLM inference, latency optimization, and serving architecture.
  • Proven track record leading complex, multi-quarter technical projects.

Benefits

  • Take what you need vacation policy.
  • Medical, Dental, and Vision benefits for you and your family.
  • Life Insurance and Disability Benefits.
  • Retirement Plan (e.g., 401K, pension).
  • Parental Leave.
  • Fertility and family building benefits through Carrot.
  • Daily lunches and snacks in the office.

Categories