Filtering & Querying · Topic 142

Full-text search integration within VDBs

Full-text search (keyword matching, stemming, BM25-style scoring) is often integrated with vector search so you can do hybrid search: combine semantic similarity and exact/keyword match. VDBs may embed an inverted index or integrate with an external search engine; queries then run both vector ANN and full-text search and merge results (e.g. via RRF or weighted scores). This supports use cases that need both “meaning” and “exact term” matching in one system.

Summary

Full-text search (keyword, stemming, BM25) is integrated with vector search for hybrid search: combine semantic similarity and exact/keyword match in one system.
VDBs may embed an inverted index or integrate with an external search engine; queries run both vector ANN and full-text, then merge results (e.g. RRF or weighted scores). Supports “meaning” and “exact term” matching together.
Architecture: built-in vs. external search; indexing of text payloads; see vector vs. keyword search trade-offs. Pipeline: index text + vectors, run both searches, merge (RRF or weighted). Practical tip: use built-in when available; external for advanced full-text needs.

Integration options

Some VDBs embed an inverted index over text payloads (e.g. stored with vectors); others integrate with an external search engine (e.g. Elasticsearch). Built-in simplifies deployment and keeps vector and keyword in one system; external offers mature full-text features (stemming, analyzers, phrase search). Queries run both vector ANN and full-text search, then merge results.

Pipeline: index text payloads (inverted index) alongside vectors → on query, run vector ANN and full-text in parallel → merge result lists (e.g. RRF or weighted scores). Trade-off: built-in is simpler; external gives more control and features. Practical tip: start with built-in if the VDB offers it; use external when you need advanced full-text or existing Elasticsearch investment.

Indexing and merge strategies

Text payloads (e.g. document body) are tokenized and indexed for keyword match. Merge strategies are the same as in general hybrid search: RRF (rank-based, no score normalization) or weighted score fusion when both systems return comparable scores. See hybrid search and weighting scores for details.

Frequently Asked Questions

What is full-text search integration in a VDB?

Keyword matching, stemming, BM25-style scoring built in or integrated so you can do hybrid search: vector (semantic) + keyword in one system. Results are merged with RRF or weighted scores.

Built-in vs. external search engine?

Some VDBs embed an inverted index; others integrate with Elasticsearch or similar. Built-in simplifies deployment; external offers mature full-text features. Queries run both vector ANN and full-text, then merge.

When do I need full-text with vector search?

When users need both semantic (similar meaning) and exact term matching (names, codes, IDs). See vector vs. keyword search and RAG use cases.

How are results merged?

Typically Reciprocal Rank Fusion (RRF) (rank-based, no score normalization) or weighted score fusion when both systems return comparable scores. Same as in general hybrid search.