← All topics

Similarity Metrics (Mathematical Foundations) · Topic 47

When to use L2 vs. Cosine Similarity?

Use L2 (Euclidean) when the magnitude of the vector is meaningful—e.g. confidence, intensity, or norm reflects importance. Use cosine similarity when you care only about direction (e.g. semantic meaning in text embeddings). For normalized vectors, the nearest neighbor under L2 is the same as under cosine; the choice then often comes down to what your embedding model and pipeline assume.

Summary

  • Many text/image embedding models are trained with cosine or inner product on normalized vectors—cosine is the “natural” metric. Unnormalized features (e.g. counts) may suit L2.
  • Match the metric to how embeddings were produced. Try both and compare recall; see dot product vs. cosine on normalized vectors for implementation.
  • For normalized vectors, nearest neighbor under L2 equals nearest under cosine; choice then depends on pipeline and engine defaults.
  • Pipeline: pick one metric per collection at creation time; many indexes are built for a specific distance and cannot be switched without rebuild.
  • Thresholding: cosine in [−1, 1] is easier to interpret; L2 is unbounded—see thresholding and normalized vs. unnormalized scores when defining cutoffs.

Embedding model and pipeline

Many text and image embedding models are trained with cosine or inner product on normalized vectors, so the “natural” metric is cosine (or equivalently dot product after normalization). If you use such embeddings with L2 without normalizing, long vectors can dominate and ranking can change.

Conversely, if your features are unnormalized by design (e.g. counts, energies), L2 may be more appropriate. Practical tip: check your embedding model’s documentation—OpenAI, Cohere, and many open-source models assume cosine or dot product on normalized outputs; using L2 on raw outputs can degrade quality.

Practical guidance

Practical rule: match the metric to how the embeddings were produced and how you want to interpret similarity. When in doubt, try both and compare recall and user-facing quality. Vector DBs typically let you specify the metric per collection.

Trade-off: cosine ignores magnitude (good for semantic text); L2 uses magnitude (good when norm encodes confidence or intensity). For thresholding and interpretability, cosine scores in [−1, 1] are often easier; L2 distances are unbounded above. See thresholding and how to define a good match score for setting cutoffs.

Practical tip: run an A/B test on a small labeled set—compute recall@k and optionally nDCG or MRR for both L2 and cosine with the same embeddings (normalized for cosine). Choose the metric that aligns with your product notion of “similar” and stick with it for the whole pipeline; changing later usually requires re-indexing.

When you cannot change the metric

Usually you need to rebuild or re-ingest if you change the metric, since many indexes are built for a specific distance (e.g. HNSW with L2 or IP). Plan metric choice at collection creation; switching later may require a new collection and migration.

Pipeline summary: at collection creation, set the distance type (L2, cosine/IP, or other) and ensure your ingestion normalizes (or does not) consistently. All queries must use the same metric and normalization; re-ranking and filtering can happen after retrieval without changing the index metric.

Frequently Asked Questions

What if my embeddings are not normalized?

You can normalize at ingest and then use cosine (via dot product) for search, or keep them unnormalized and use L2 if magnitude is meaningful. Normalizing vectors: why it is necessary covers when and how to normalize.

Does the choice affect recall?

Yes. Using the wrong metric for your data can hurt recall and ranking; impact of distance metrics on recall describes how to evaluate and compare.

Can I switch metric after indexing?

Usually you need to rebuild or re-ingest with the new metric, since many indexes are built for a specific distance (e.g. L2 or IP).

Why do APIs offer both L2 and cosine?

Different use cases: L2 when magnitude matters; cosine when only direction matters. Implementation often does cosine via normalized vectors and dot product.