Similarity Metrics (Mathematical Foundations) · Topic 41

Euclidean Distance (L2) explained.

Euclidean distance (also called L2 distance) is the straight-line distance between two vectors. It is the default “as the crow flies” measure in geometry and is widely used in vector databases for nearest neighbor search when magnitude matters—i.e., when the length of the vector carries information.

Summary

Formula: L2(a, b) = √(Σ(aᵢ − bᵢ)²); smaller L2 = closer; many ANN indexes use L2 or L2² for nearest-neighbor search.
L2 is sensitive to scale; use when magnitude is meaningful. For direction-only (e.g. text semantics), cosine similarity is often preferred.
See when to use L2 vs. cosine similarity for trade-offs; vector DBs typically support L2 per collection.
Squared L2 (L2²) preserves neighbor ordering and avoids the square root; use it inside indexes for speed, then optionally convert to L2 for interpretable scores.
L2 is a true metric (triangle inequality holds), which many graph-based ANN algorithms rely on for pruning and correctness.

Formula and computation

For two vectors a and b of dimension d, the Euclidean distance is the square root of the sum of squared differences: L2(a, b) = √(Σ(aᵢ − bᵢ)²). In 2D or 3D this is the familiar ruler distance; in high dimensions it generalizes the same idea. Smaller L2 means closer points.

Many ANN indexes (e.g. HNSW, IVF) are built to minimize L2 or its squared form (L2²) to avoid the square root and speed up comparisons. Ordering is preserved: the k nearest under L2 are the same as the k nearest under L2², so implementations often compute L2² internally and only take the square root when returning human-readable distances.

In practice, L2 is computed in a single pass over dimensions (sum of squared differences, then square root). For batch comparisons, libraries often use SIMD or GPU to compute many L2 distances in parallel; see hardware acceleration for distance calculations. Numerical stability is rarely an issue for typical embedding dimensions (e.g. 256–1536), but for very large coordinates consider scaling or using L2² to avoid overflow in squared terms.

Scale sensitivity

L2 is sensitive to scale: doubling every component of a vector doubles its distance from the origin. So it’s a good choice when magnitude is meaningful—e.g. embedding norms reflect confidence or intensity. When you care only about direction (e.g. text semantics), cosine similarity is often preferred. Many vector DBs support L2 as the default or an option.

Practical tip: if your embeddings are not normalized and their length carries information (e.g. relevance or confidence), stick with L2. If you normalize, L2 on the unit hypersphere is equivalent to angle-based similarity; see normalized vs. unnormalized distance scores for how this affects thresholding and recall.

Pipeline and indexing

Typical pipeline: embed documents and queries with the same model, upsert vectors into a collection configured for L2 (or L2²), and run k-NN or ANN search. The index stores vectors and, for L2², compares by sum of squared differences (SSD) without computing square roots. At query time, the same metric is used so that returned distances are consistent and rank order is correct.

Trade-off: L2 is well supported and intuitive, but in very high dimensions all points can become almost equidistant (curse of dimensionality). ANN algorithms mitigate this by restricting search to promising regions; choosing the right index (e.g. HNSW vs. IVF) and parameters affects recall and latency more than the choice of L2 vs. L2².

When to use L2 and when to avoid it

Use L2 when vector magnitude is meaningful—e.g. regression outputs, intensity features, or embeddings that are not normalized by design. Avoid L2 when you care only about direction (e.g. most text embeddings): then cosine similarity or dot product on normalized vectors is usually better. For normalized vectors, L2 and cosine give the same nearest-neighbor ordering; L2² is often slightly cheaper to compute than cosine in some back ends.

Practical tip: check your embedding model’s documentation—many models output unit-normalized vectors by default, in which case L2 and cosine yield the same ranking. If you are unsure, run a small experiment: compare recall@k for L2 vs. cosine on a labeled set; choose the metric that matches your notion of “similar” and stick with it for the whole pipeline. Thresholding for “good” matches depends on the metric; see thresholding and normalized vs. unnormalized distance scores when interpreting scores.

Frequently Asked Questions

Can I use L2 on normalized vectors?

Yes. For unit-length vectors, the nearest neighbor under L2 is the same as under cosine (both reflect angle). Some systems normalize and then use L2; see dot product and cosine on normalized vectors.

Why use L2² instead of L2 in indexes?

Squared L2 avoids the square root; ordering of neighbors is the same (min L2 ⇔ min L2²). So comparisons are cheaper. Final scores can be square-rooted if you need true distance for display.

Does L2 satisfy the triangle inequality?

Yes. L2 is a true metric; see mathematical properties of a metric space. That property is used by some ANN algorithms for pruning.

When would I choose L2 over inner product?

When vectors are not normalized and magnitude matters. For normalized vectors, L2 and (negative) inner product give the same ranking; inner product is often faster. See impact of distance metrics on recall.