about 1 month ago
Responsibilities
- Build state-of-art multimodal data mining and semantic search solutions to power AV product development.
- Develop data understanding platform infrastructure for real-time querying/vector databases and batch/stream processing using technologies like Ray, Spark, Lance, or similar.
- Deliver end-to-end data mining solutions that span onboard (C++) and offboard (ML & Data Infra) infrastructure to accelerate AV product development.
- Develop e2e solution for real-time semantic search services (text/images/videos) and vector DBs.
- Discover and identify key issues in existing ML infra and proactively improve system performance.
- Build low latency/high throughput batch or stream processing pipelines.
- Drive technical discussions across multiple orgs and deliver solutions on a timely basis.
- Architect and tune ETL pipelines to maximize GPU/CPU/Ram utilization.
- Write readable and high-performance Python/C++ code.
Requirements
- Experience with both ML platforms and building ML-based applications.
- Proven track record of building scalable, reliable infrastructure in a fast-paced environment.
- Ability to collaborate effectively across teams.
- Experience building or using ML infrastructure for a large number of customer teams.
- Deep understanding of design trade-offs with the ability to articulate those trade-offs.
- Experience in building ML models or infrastructure in domains such as autonomous vehicles is desirable.
- Experience with model training, model optimization, or large data processing pipelines.
- Prior experience in autonomous vehicles (AV) is a plus.
- 6+ years of experience with multimodal data indexing and inference pipelines.