GrepJob
Abridge

Machine Learning Infrastructure Engineer- Model Inference

Abridge
Apply
9 months ago
San Francisco, CA, USAMid Level / Senior
H1B Sponsor

Base Salary

$221k - $260k/yr

Responsibilities

  • Design, deploy, and maintain scalable Kubernetes clusters for AI model inference and training.
  • Develop, optimize, and maintain ML model serving infrastructure for high performance and low latency.
  • Collaborate with ML and product teams to scale backend infrastructure for AI-driven products.
  • Optimize compute-heavy workflows and enhance GPU utilization for ML workloads.
  • Build a robust model API orchestration system.
  • Define and implement strategies for scaling infrastructure as the company grows.

Requirements

  • Strong experience in building and deploying machine learning models in production environments.
  • Deep understanding of container orchestration and distributed systems architecture.
  • Expertise in Kubernetes administration, including custom resource definitions and cluster management.
  • Experience developing APIs and managing distributed systems for batch and real-time workloads.
  • Excellent communication skills to interface between research and product engineering.

Benefits

  • 14 paid holidays and flexible PTO for salaried employees.
  • Comprehensive health plans including medical, dental, and vision coverage.
  • Generous HSA contributions for those on a High Deductible Health Plan.
  • Paid parental leave and family forming benefits.
  • 401(k) matching to help invest in your future.
  • Personal device allowance and access to pre-tax benefits.
  • Monthly contributions for fitness and professional development.
  • Dedicated mental health support and paid sabbatical leave after 5 years.

Tech Stack

AnsibleKubernetesPyTorchTensorFlowTerraform