
Member of Technical Staff - Efficient ML
Embedding VC4 months ago
San Francisco, CA, USAMid Level / Senior
Responsibilities
- Enhance training efficiency through dataloaders and gradient checkpointing.
- Optimize GPU performance using Nsight profiling and CUDA kernels.
- Implement low-latency serving and continuous batching for inference.
- Manage multi-node jobs with SLURM and Kubernetes.
- Ensure reliability and determinism in the machine learning infrastructure.
Requirements
- Experience with machine learning frameworks and optimization techniques.
- Proficiency in GPU programming and performance profiling tools.
- Familiarity with SLURM and Kubernetes for job management.
- Knowledge of quantization, distillation, and pruning methods.
- Strong problem-solving skills and ability to work in a team environment.
Tech Stack
Kubernetes
Categories
AI & MLData Engineering