Similarity Metrics (Mathematical Foundations) · Topic 52

Why does Cosine Similarity ignore magnitude?

Cosine similarity is defined as the dot product of two vectors divided by the product of their norms. That ratio depends only on the direction (angle) of the vectors, not their length. So scaling a vector by any positive constant leaves its cosine similarity with every other vector unchanged.

Summary

For embeddings, ignoring magnitude is often desirable: direction encodes “which” concepts; length can be an artifact. Many models output normalized or nearly normalized vectors.
When magnitude matters (e.g. confidence), use L2 or unnormalized dot product. See dot product and cosine on normalized vectors.
Doubling (or scaling) a vector leaves its cosine with every other vector unchanged; only the angle matters.
Pipeline: use cosine when direction is the signal (e.g. text semantics); use L2 or dot product when norm encodes importance or confidence.
Trade-off: cosine simplifies comparison and matches many embedding objectives; when norm is informative, use L2 or unnormalized dot product and interpret scores accordingly.

Why direction-only helps embeddings

In text or image embeddings, “how much” of something (e.g. document length or pixel intensity) can be a side effect of representation; what usually matters for semantic search is which concepts are present—i.e. direction in latent space. Cosine similarity treats “same direction” as same meaning, regardless of vector length, which aligns well with many embedding models whose outputs are already normalized or nearly so.

Trade-off: ignoring magnitude simplifies comparison and matches many embedding training objectives (e.g. contrastive loss on normalized vectors). When magnitude is informative (e.g. model confidence, relevance score), cosine can be the wrong choice; then L2 or unnormalized dot product preserves that signal.

When to use a different metric

When magnitude does matter (e.g. importance or confidence), L2 distance or dot product (unnormalized) can be better. Many vector databases support multiple metrics so you can choose: use cosine when you care about orientation, and L2 or dot product when scale carries information.

For normalized data, cosine and dot product rank neighbors identically while differing in scale and interpretation. Practical tip: if your embedding model outputs unnormalized vectors whose length varies (e.g. by document length), either normalize and use cosine or keep raw and use L2 so that ranking is consistent with your notion of similarity.

Pipeline summary: decide at collection creation whether you want direction-only (cosine) or magnitude-aware (L2/dot product) comparison. When to use L2 vs. cosine similarity gives a decision guide; normalized vs. unnormalized distance scores explains how to interpret and threshold in each case. Re-ranking after retrieval can use a different notion of relevance (e.g. cross-encoder) without changing the index metric.

Frequently Asked Questions

Does cosine change if I double every component of a vector?

No. Doubling the vector keeps direction the same, so cosine with any other vector is unchanged.

Why do text embeddings often use cosine?

Document length or token count can inflate norms without changing meaning; cosine focuses on semantic direction. See when to use L2 vs. cosine.

Can cosine be negative?

Yes. Negative cosine means angle > 90° (opposite direction). Common when embeddings can be negative; see handling negative values in vector components.

Is “ignoring magnitude” always good?

No. When norm encodes importance or confidence (e.g. model confidence scores), L2 or unnormalized dot product may be better.