GrepJob
Reddit

Staff Machine Learning Engineer, AI Serving

Reddit
Apply
about 4 hours ago
Remote, United StatesStaff+
H1B Sponsor

Base Salary

$253k - $355k/yr

Responsibilities

  • Lead the design, implementation, and maintenance of a GPU-based model serving system.
  • Develop ML and Generative AI systems in cloud environments on Kubernetes.
  • Rapidly prototype and create a high-performance feature hydration and processing system.
  • Establish a unified GPU model export framework for optimized inference models.
  • Implement real-time ML observability to track feature and model performance.
  • Work with LLM serving online at scale.
  • Build an end-to-end inference performance benchmarking framework.
  • Understand multi-cluster compute environments and network topology for ML inference.

Requirements

  • 7+ years of experience in ML Engineering, AI Platform Engineering, or Cloud AI Deployment roles.
  • Experience operating orchestration systems like Kubernetes at scale.
  • Deep knowledge of cloud technologies for ML platforms, including AWS and Google Cloud.
  • Proficiency in programming languages and frameworks such as Go and Python.
  • Excellent communication skills for articulating technical concepts to non-technical stakeholders.
  • Strong focus on scalability, reliability, performance, and user experience.
  • Knowledge of model serving, inference pipelines, and observability for AI systems is a plus.
  • Strong proficiency in Python and experience with AI/ML frameworks like Triton and Pytorch.

Benefits

  • Comprehensive Healthcare Benefits and Income Replacement Programs.
  • 401k with Employer Match.
  • Global Benefit programs that fit your lifestyle.
  • Family Planning Support.
  • Gender-Affirming Care.
  • Mental Health & Coaching Benefits.
  • Flexible Vacation & Paid Volunteer Time Off.
  • Generous Paid Parental Leave.

Tech Stack

AWSGoGoogle CloudKubernetesPythonPyTorchTerraform

Categories

AI & MLData Science