GrepJob
Moonlake

Member of Technical Staff - Data & ML Infra Engineer

Moonlake
Apply
9 months ago
San Francisco, CA, USAMid Level / Senior
H1B Sponsor

Responsibilities

  • Optimize GPU performance using CUDA and Triton kernels.
  • Enhance the serving stack with TensorRT-LLM and Triton Inference Server.
  • Implement parallelism techniques like FSDP and NCCL tuning.
  • Work on quantization and PEFT strategies for model serving.
  • Manage systems for observability and autoscaling.

Requirements

  • Experience with GPU performance optimization and CUDA.
  • Familiarity with serving stacks like TensorRT-LLM and Triton.
  • Knowledge of parallelism techniques and NCCL tuning.
  • Experience with quantization methods such as AWQ and GPTQ.
  • Background in infrastructure-heavy startups like Databricks or Roblox.

Benefits

  • On-site, in-person team environment in San Mateo.

Tech Stack

Argo CDGrafanaKubernetesPrometheus

Categories

AI & MLData Engineering