Filtering & Querying · Topic 136

Multi-vector search (Per-document multiple vectors).

Multi-vector search means each logical “document” is represented by more than one vector. Common cases: (1) chunked text—each chunk has its own embedding, so one document has many vectors; (2) multiple aspects or views (e.g. title embedding + body embedding); (3) multi-modal (e.g. image + text embeddings per item). The query is compared to all relevant vectors, and results are aggregated per document (e.g. max score, or sum, per doc). Implementation choices affect index size, latency, and how documents with multiple matching chunks are scored.

Summary

Multi-vector search: each document has more than one vector—e.g. chunked text, multiple aspects (title + body), or multi-modal (image + text). Query is compared to relevant vectors; results are aggregated per document (max score, sum, etc.).
Store one vector per row with document ID and aggregate in app, or use a VDB with multi-vector/parent-child so one doc ID maps to many vectors. Chunked RAG: embed chunks, store with doc/conversation ID, run vector search, then re-rank or deduplicate by document.
Query either “search all vectors and group by document” or “search per vector type and merge.” Trade-offs: index size, latency, and how to score a document with multiple matching chunks (best chunk score vs. aggregate).
Pipeline: embed per chunk/aspect, store with doc ID, run vector search, group by doc, aggregate scores (max/sum) or re-rank. Practical tip: start with max-per-doc; try sum or learned combiner if you want to favor docs with many good chunks.

Implementation options

Implementation options: store one vector per row with a document ID and aggregate in the application (query once per vector type or run one big search and group by doc ID); or use a VDB that supports “multi-vector” or “parent-child” so one document ID maps to many vectors and the engine returns document-level results. Chunked RAG is the typical scenario: you embed each chunk, store chunk vectors with a document or conversation ID, run vector search, then re-rank or deduplicate by document. Pipeline: embed per chunk or aspect, store with doc ID, run vector search, group by doc, aggregate scores (max, sum) or re-rank.

Querying and scoring

Querying can be “search all vectors and group by document” or “search per vector type and merge” (e.g. query against title index and body index separately, then combine). Trade-offs include index size, latency, and how to score a document that has several matching chunks.

Take the best chunk score (max), or aggregate (sum, or learned combiner). Trade-off: max favors one strong match; sum favors documents with many good chunks. Practical tip: start with max-per-doc; try sum or a learned combiner if you want to favor docs with many relevant chunks.

Frequently Asked Questions

What is multi-vector search?

Each logical document is represented by more than one vector: e.g. chunked text (one vector per chunk), multiple aspects (title + body), or multi-modal (image + text). The query is compared to all relevant vectors; results are aggregated per document. See RAG for chunked use cases.

How do I implement it?

Store one vector per row with a document ID and aggregate in the application; or use a VDB that supports multi-vector or parent-child so one document ID maps to many vectors and the engine returns document-level results. Chunked RAG typically: embed chunks, store with doc ID, search, then re-rank or deduplicate by document.

How do I score a document with multiple matching chunks?

Common approaches: take the best chunk score (max), or aggregate (sum, or a learned combiner). Trade-off between favoring documents with one strong match vs. several good matches. Depends on your re-ranking and relevance goals.

Can I search per vector type and merge?

Yes. Query against title index and body index separately, then combine results (e.g. with RRF or weighted fusion). Trade-offs include index size, latency, and merge logic.