Member of Technical Staff - Efficient ML

5 months ago

San Francisco, CA, USAMid Level / Senior

Responsibilities

Enhance training efficiency through dataloaders and gradient checkpointing.
Optimize GPU performance using Nsight profiling and CUDA kernels.
Implement low-latency serving and continuous batching for inference.
Manage multi-node jobs with SLURM and Kubernetes.
Ensure reliability and determinism in the machine learning infrastructure.

Requirements

Experience with machine learning frameworks and optimization techniques.
Proficiency in GPU programming and performance profiling tools.
Familiarity with SLURM and Kubernetes for job management.
Knowledge of quantization, distillation, and pruning methods.
Strong problem-solving skills and ability to work in a team environment.

Tech Stack

Categories

AI & MLData Engineering