Similarity Metrics (Mathematical Foundations) · Topic 42

Cosine Similarity explained.

Cosine similarity measures how similar two vectors are by the angle between them, ignoring their length. It is the dot product of the vectors divided by the product of their norms: cos(a, b) = (a · b) / (‖a‖ ‖b‖). Values range from −1 (opposite) to +1 (identical direction); 0 means orthogonal. It is one of the most common metrics in semantic search and text embeddings.

Summary

Depends only on direction; robust to magnitude. For normalized vectors, cosine equals dot product—fast to compute.
Higher = more similar; for ANN “distance” use angular distance or 1 − cos(a, b). See why cosine ignores magnitude and when to use L2 vs. cosine.
Typical pipeline: normalize at ingestion and query time, then use dot product in the index; scores in [−1, 1] simplify thresholding and re-ranking.
Cosine is not a metric; use angular distance or 1 − cos when the index expects a distance (smaller = closer).
Widely used in semantic search and recommendation; combine with re-ranking for best accuracy when recall matters.

Why direction only

Because cosine depends only on direction, it is robust to magnitude: a short and a long vector in the same direction have cosine 1. That makes it ideal when length is irrelevant—e.g. many text embedding models produce vectors whose direction encodes meaning.

Discarding magnitude reduces sensitivity to embedding scale and batch effects: two documents with the same semantic content but different chunk lengths or model runs still get high cosine if their directions align. That stability is why cosine is the default for text and semantic retrieval in most vector databases and embedding benchmarks.

For normalized vectors (unit length), cosine similarity equals the dot product, which is fast to compute and is how many vector DBs implement cosine search internally. Practical tip: ensure both stored and query vectors are normalized the same way so that scores are comparable and thresholding is meaningful.

Using cosine as a distance

Cosine is a similarity (higher = more similar). To use it as a distance for nearest neighbor indexes, it’s often converted to angular distance or to 1 − cos(a, b) so that “closer” means smaller distance. Choosing cosine over Euclidean (L2) depends on whether magnitude matters in your embeddings.

Trade-off: cosine ignores magnitude, which is good for semantic text but can be wrong when vector length carries information (e.g. confidence or intensity). When in doubt for text embeddings, cosine is the default; for other modalities or mixed signals, compare L2 and cosine on a small eval set.

When defining a “good” match, use a threshold on cosine (e.g. > 0.7 or > 0.8) depending on your data and recall requirements. Thresholding and normalized vs. unnormalized distance scores affect how you interpret and tune these cutoffs; keep the metric fixed across ingestion and query so that thresholds remain comparable.

Pipeline and implementation

Typical pipeline: embed documents and queries with the same model; normalize all vectors to unit length before upserting and before each query. Create a collection with cosine (or dot product) as the distance metric. At search time, the index compares the query vector to stored vectors via dot product; returned scores are in [−1, 1], which makes thresholding and re-ranking straightforward.

Many embedding APIs return normalized vectors by default. If yours does not, normalize once at ingestion and once per query. Do not double-normalize (idempotent but redundant). For how cosine compares to other metrics in CPU cost and for faster execution, see computation overhead of different distance metrics and hardware acceleration (SIMD) for distance calculations.

Practical tip: when migrating from one embedding model to another, re-normalize and re-index; cosine scores are not comparable across models. If you use re-ranking after retrieval, cosine (or dot product) scores from the first stage are often fed into a cross-encoder or scorer for a second pass—see re-ranking for why that improves accuracy.

Frequently Asked Questions

What range does cosine similarity have?

From −1 (opposite direction) to +1 (same direction); 0 means orthogonal. For non-negative embeddings it’s often in [0, 1].

How do vector DBs implement cosine search?

Usually by storing normalized vectors and using inner product for comparison (equivalent to cosine). See dot product and cosine on normalized vectors.

Is cosine a metric?

Cosine itself is a similarity, not a distance. 1 − cos or angular distance can be used as a distance; see angular distance vs. cosine similarity.

When is cosine better than L2?

When you care only about direction (e.g. semantic text). When magnitude matters (e.g. confidence scores), L2 may be better; see impact of distance metrics on recall for evaluation.