Filtering & Querying · Topic 133

Hybrid Search: Combining Vector + Keyword Search.

Hybrid search runs both vector (semantic) search and keyword search (e.g. BM25) on the same query, then merges the result lists so you get the benefits of both: semantic recall (meaning) and lexical precision (exact terms). It is widely used in RAG, product search, and enterprise search when users expect both “similar meaning” and “contains these words.”

Summary

Hybrid search runs vector (semantic) and keyword (e.g. BM25) search on the same query and merges results for both semantic recall and lexical precision; common in RAG, product search, and enterprise search.
Vector path uses query embedding and returns nearest neighbors; keyword path uses raw text and term match. Rankings are combined via fusion—often RRF—or weighted scores when scales are comparable.
Can be two systems (vector DB + search engine) with a coordinator, or a single system with dense vectors and inverted index. Improves over pure vector when exact terms matter and over pure keyword when synonyms/paraphrases matter.
Trade-off: RRF avoids score calibration but ignores magnitude; weighted fusion needs normalized scores. Full-text integration in one system simplifies deployment.
Practical tip: start with RRF for simplicity; tune weights or try weighted fusion when you have labeled data to optimize the mix.

How hybrid search works

The vector path uses an embedding of the query and returns nearest neighbors; the keyword path uses the raw query text and returns documents ranked by term match (e.g. BM25). The two rankings are combined with a fusion method—commonly Reciprocal Rank Fusion (RRF)—which doesn’t require comparable score scales. You can also combine scores with weights (e.g. 0.7 vector + 0.3 keyword) when both systems return normalized scores.

Pipeline: same query text → embed for vector path, tokenize for keyword path → run both searches (vector DB + keyword index) → merge result lists by RRF or weighted sum → return unified ranking.

Implementation options

Implementation can be two separate systems (vector DB + search engine) with a coordinator merging results, or a single system that supports both dense vectors and inverted index in one place.

Hybrid improves over pure vector when exact terms matter (names, codes, IDs) and over pure keyword when synonyms and paraphrases matter. See full-text search integration within VDBs for how keyword is often integrated. Trade-off: RRF avoids score calibration but ignores score magnitude; weighted fusion needs normalized scores and tuning. Practical tip: start with RRF for simplicity; tune weights when you have labeled data to optimize the mix.

Frequently Asked Questions

What is hybrid search?

Vector (semantic) and keyword search run on the same query; result lists are merged so you get both meaning-based and term-based relevance. Common in RAG, product search, and enterprise search.

How are the two result lists combined?

Usually with Reciprocal Rank Fusion (RRF), which uses ranks only (no score normalization). When scores are comparable you can use weighted score fusion (e.g. 0.7 vector + 0.3 keyword).

When is hybrid better than pure vector or pure keyword?

Better than pure vector when exact terms (names, codes, IDs) matter; better than pure keyword when synonyms and paraphrases matter. Full-text integration in VDBs often supports both in one system.

Do I need two separate systems for hybrid?

No. You can use a vector DB plus a search engine with a coordinator, or a single system that supports both dense vectors and an inverted index. See your VDB docs for full-text search integration.