What is a “Dense Vector” vs. a “Sparse Vector”?
A dense vector has most of its dimensions set to non-zero values; a sparse vector has only a small fraction of non-zero dimensions (the rest are zero). In practice, embeddings from neural models are usually dense, while bag-of-words or TF–IDF style representations are often sparse. Vector databases commonly optimize for dense vectors, though some support sparse similarity (e.g. overlap metrics) as well.
Summary
- Dense: most dimensions non-zero, fixed length (e.g. 768, 1536); from neural embeddings; compared with cosine or L2.
- Sparse: few non-zero dimensions, often very high nominal dimension (e.g. vocabulary size); from BM25, bag-of-words; compared with Jaccard or overlap; related to keyword search.
- Most vector DBs and ANN indexes are built for dense vectors; some support hybrid (dense + sparse).
Dense vectors
Dense vectors are fixed-length (e.g. 768 or 1536 dimensions) and every position carries information. They’re produced by models that compress input into a continuous latent space, so similarity is measured with cosine similarity or L2 distance. Most vector databases and ANN indexes are built for dense vectors because the math (dot product, L2) is regular and easy to accelerate with SIMD or GPUs. Dense embeddings from transformers or CNNs are the standard input for semantic search and vector query pipelines.
The fixed dimensionality is determined by the embedding model. All vectors in a collection share the same dimension, which simplifies storage (contiguous arrays) and index structures. Because most dimensions are non-zero, distance computations typically iterate over the full vector, which hardware can do very efficiently with vectorized instructions.
Sparse vectors
Sparse vectors can have very high nominal dimension (e.g. vocabulary size) with only a few non-zero entries. They’re common in traditional text retrieval: bag-of-words, TF–IDF, BM25. Comparing them often uses Jaccard or overlap; keyword-style search is related. Storing them efficiently requires representing only (index, value) pairs, not a full array. Some systems support hybrid search: combine dense (semantic) and sparse (keyword) signals for better results. So: dense = full, learned, semantic; sparse = few non-zeros, often lexical or count-based.
Sparse representations scale with vocabulary or feature set size; the actual storage and comparison cost scale with the number of non-zero entries per vector. That makes them suitable for very high-dimensional discrete features where most dimensions are zero for any given item.
When to use which
Use dense when you care about meaning and similarity across paraphrases, synonyms, or modalities (e.g. CLIP for text–image). Use sparse when exact term match or lexical overlap matters (e.g. product IDs, codes, or when users type precise keywords). Many production systems use both: dense for recall and semantic coverage, sparse (or BM25) for precision and then RRF or weighted fusion. See hybrid search for combining vector and keyword search.
Hybrid setups often give the best of both: semantic recall from dense vectors and exact-match or lexical boost from sparse/BM25. The fusion method (RRF vs. weighted sum) and weights are tunable per use case.
Storage and indexing implications
Dense vectors are stored as contiguous arrays; indexes like HNSW and IVF are designed for them. Sparse vectors are stored as lists of (dimension, value); some ANN libraries support sparse or hybrid indexes. Memory usage for dense is roughly dimension × 4 bytes (float32) per vector; sparse depends on the number of non-zeros. For quantization and PQ, dense vectors are the usual target.
When you have both dense and sparse in the same system, the index may maintain two structures (e.g. HNSW for dense, inverted index for sparse) and merge results at query time. This increases implementation complexity but allows a single query to leverage both representations.
Frequently Asked Questions
Can I mix dense and sparse in one query?
Some vector DBs support hybrid search: one query returns results that combine dense similarity and sparse (e.g. BM25) scores, often via reciprocal rank fusion or weighted combination.
Are transformer embeddings always dense?
Typical transformer-based text embeddings (e.g. sentence-transformers) output dense vectors. Sparse variants exist (e.g. learned sparse representations) but dense is the norm for embeddings in VDBs.
Does cosine similarity work for sparse vectors?
Cosine is defined for any vector with non-zero norm. For sparse vectors, computation can be optimized (only non-zero dimensions). For very high-dimensional sparse data, Jaccard or overlap is also common.
Which distance metric for dense vectors?
For normalized dense vectors, cosine similarity and dot product are equivalent. For unnormalized, L2 is common. See when to use L2 vs. cosine similarity.