6 months ago
San Francisco, CA, USAMid Level / Senior
Responsibilities
- Train and evaluate Vision-Language Models specialized for motion understanding.
- Design and scale GPU-accelerated pipelines for training and inference on multi-modal data.
- Build evaluation frameworks for benchmarking spatiotemporal reasoning and localization accuracy.
- Develop curation loops that utilize models to generate and refine datasets.
- Publish research while delivering features for customer use.
Requirements
- Strong proficiency in Python and PyTorch.
- Research experience in foundation models or multi-modal learning.
- Ability to run experiments end-to-end autonomously.
- Experience with training models on video or sensor data.
- Understanding of retrieval systems and GPU optimization.
