Distributed LLM Inference Engineer

4 days ago

Palo Alto, CA, USA or San Francisco, CA, USAMid Level / Senior

H1B Sponsor

Base Salary

$170k - $247k/yr

Responsibilities

Collaborate with product teams to deliver end-to-end solutions for batch and online inference.
Integrate Ray Data and LLM engines to optimize large-scale ML inference.
Work with open-source software like vLLM and contribute improvements to the community.
Stay updated on state-of-the-art practices in the open-source and research communities.

Requirements

Familiarity with running ML inference at large scale with high throughput and low latency.
Experience with deep learning frameworks such as PyTorch.
Solid understanding of distributed systems and ML inference challenges.
Bonus points for knowledge of ML systems and experience using Ray.
Experience with community engagement on LLM engines like vLLM and contributions to deep learning frameworks.

Benefits

Stock options.
Healthcare plans with 99% premium coverage for employees and dependents.
401k retirement plan.
Education and wellbeing stipend.
Paid parental leave.
Fertility benefits.
Paid time off.
Commute reimbursement.
100% of in-office meals covered.

Tech Stack

PyTorchTensorFlow

Categories

AI & MLData Engineering