Member of Technical Staff - Multi-Modal, Vision

8 months ago

San Francisco, CA, USAMid Level / Senior

H1B Sponsor

Responsibilities

Lead the development of new model capabilities from task specification to deployment.
Enhance visual reasoning using reinforcement learning and preference optimization.
Optimize token efficiency through innovative encoder and connector designs.
Collaborate with pretraining, post-training, and infrastructure teams.
Ensure high-quality outputs through rigorous evaluation and iteration.

Requirements

Hands-on experience in training or evaluating vision-language models.
Ability to translate research ideas into scalable implementations.
Proficiency in Python and at least one deep learning framework.
M.S. or Ph.D. in Computer Science, Mathematics, or a related field, or equivalent industry experience.
Experience with multimodal training or data pipelines is preferred.

Benefits

Full ownership of work from architecture to deployment.
Competitive base salary with equity in a unicorn-stage company.
100% coverage of medical, dental, and vision premiums for employees and dependents.
401(k) matching up to 4% of base pay.
Unlimited PTO plus company-wide Refill Days throughout the year.

Tech Stack

Categories

AI & ML BackendData Science