Basic Fundamentals · Topic 17

Common use cases for VDBs (Recommendation, Chatbots)

Vector databases power several core applications: recommendation (find items similar to what the user liked), semantic search (find content by meaning, not just keywords), RAG chatbots (retrieve relevant docs and feed them to an LLM for grounded answers), deduplication (find near-duplicate items), and anomaly detection (flag points far from “normal” clusters).

Summary

Recommendation: embed items (and optionally user); query with user history or one item; return nearest neighbors as suggestions.
Semantic search: embed corpus and query; return nearest chunks/documents by meaning; see semantic search.
RAG / Chatbots: embed knowledge chunks; for each question run semantic search, pass results as context to LLM; see RAG stack.
Also: anomaly detection, long-term memory, LLM caching; all rely on fast ANN over embeddings.

Recommendation

In recommendation, each item (product, video, article) is embedded; you query with the user’s history (e.g. average of recently viewed vectors) or a single item and return the nearest neighbors as suggestions. “Users who liked this also liked…” and “similar items” are both nearest-neighbor queries over item embeddings. You can add filters (e.g. category, in-stock) so results are relevant and actionable. The same primitive—fast ANN over embeddings—powers this and the other use cases below.

Item-to-item recommendation uses one item’s vector as the query. User-to-item can use an aggregate of the user’s interaction vectors (e.g. mean or weighted sum) as the query vector. Cold start for new users or items can be handled with fallbacks (e.g. popular items, random) or with side-information embeddings.

Semantic search

Semantic search finds content by meaning. You embed the corpus (e.g. document chunks) and the query; the vector DB returns the nearest chunks. Use it for site search, knowledge bases, and “find docs like this.” Often combined with hybrid search (vector + keyword) and re-ranking for better relevance.

Chunking strategy and embedding model choice directly affect result quality. For long documents, splitting into overlapping passages and embedding each passage gives finer-grained retrieval than embedding the whole document once.

RAG and chatbots

In chatbots and RAG, you embed chunks of knowledge (docs, FAQs, past conversations), then for each user question you run semantic search to fetch the most relevant chunks and pass them as context to the LLM. That’s the RAG stack: retrieve with a vector DB, then generate with an LLM so answers are grounded in your data. Chatbots may also use the VDB for long-term memory (store and retrieve past interactions by similarity).

RAG quality depends on retrieval recall (are the right chunks in the top k?) and on context length limits of the LLM. Tuning k, chunk size, and re-ranking helps. For multi-turn chatbots, including recent conversation turns in the query or storing turn embeddings improves context relevance.

Other use cases

Other uses include anomaly detection (e.g. embed log lines or transactions; flag points far from “normal” clusters), long-term memory for agents, and LLM response caching (cache prompts/context by embedding and reuse cached responses for similar queries). Deduplication: embed items and find near-duplicates by small distance. The common thread is: turn data into vectors, store them in a vector database, and query by similarity.

Frequently Asked Questions

Do I need a separate vector DB for recommendations vs. search?

Not necessarily. You can have different collections (e.g. products, documents) in one VDB. Same architecture and query path; different embedding models and metadata per use case.

How does RAG differ from plain semantic search?

RAG adds a generation step: you retrieve relevant chunks with the vector DB, then pass them as context to an LLM that generates an answer. So RAG = retrieval (vector DB) + generation (LLM). See the RAG stack.

Can I use a vector DB for real-time recommendations?

Yes. If ingestion is incremental and the index supports it (e.g. HNSW), new items can appear in nearest neighbor results soon after insert. Latency is usually low enough for real-time; see real-time vs. offline indexing.

What about cold start for new users or items?

New users/items have no or few vectors; similarity can be weak. Mitigations: use metadata (e.g. category), fallback to popular or random items, or use side information. See cold start with embeddings.