Similarity Metrics (Mathematical Foundations) · Topic 59

Mathematical properties of a metric space

A metric (distance function) d on a set must satisfy: (1) non-negativity: d(x, y) ≥ 0; (2) identity: d(x, y) = 0 iff x = y; (3) symmetry: d(x, y) = d(y, x); (4) triangle inequality: d(x, z) ≤ d(x, y) + d(y, z). Many vector distances used in vector databases are true metrics; some similarity functions are not.

Summary

L2, L1, and angular distance are metrics. Cosine is a similarity (1 − cosine can violate triangle inequality). Dot product is neither a metric nor bounded.
Graph-based ANN (e.g. HNSW) uses the triangle inequality to prune; prefer metric distances. Cosine is often implemented as L2 on normalized vectors or angular distance. See custom distance functions.
Triangle inequality: d(x, z) ≤ d(x, y) + d(y, z); ANN algorithms use it to bound distances and skip branches during graph or tree traversal.
Pipeline: choose a built-in metric (L2, cosine/IP, L1) so the index can use pruning; avoid non-metric “distances” unless the engine explicitly supports them.
Practical tip: L2 and angular (or L2 on normalized vectors for cosine) are metrics; use one of them for best index compatibility and pruning.

Which functions are metrics

L2, L1, and angular distance are metrics. Cosine similarity is not a distance (it’s a similarity; 1 − cosine is not a metric because it can violate the triangle inequality). The dot product is neither a metric nor bounded. Graph-based ANN algorithms like HNSW rely on the triangle inequality to prune paths during search; they are designed for metric distances.

Using a non-metric “distance” can still work for nearest-neighbor ranking but may weaken theoretical guarantees of the index. Practical tip: when in doubt, use L2 or cosine (via normalized + dot product); both are well supported and either is a true metric (L2) or is implemented as one (angular/L2 on unit vectors). See Hamming distance for binary vectors for another common metric.

Choosing and implementing distance

When implementing or choosing a distance function, check whether your index assumes a metric. If it does, prefer L2 or angular distance (or the index’s native metric). If you use cosine, many systems convert it to angular distance or to L2 on normalized vectors so that the index sees a proper metric while you reason in terms of similarity.

Trade-off: metric distances enable efficient pruning and have clear semantics; non-metric similarities (e.g. raw dot product on unnormalized vectors) can still rank correctly but may not integrate well with graph or tree indexes that assume the triangle inequality. For custom or non-standard metrics, custom distance functions: are they supported describes engine support and workarounds.

Pipeline summary: when creating a collection, pick a built-in metric (L2, cosine/IP, L1) so the index can exploit the triangle inequality. If you need a non-metric similarity, pre-transform vectors so that L2 in the transformed space matches your notion, or use a two-stage pipeline (ANN with a metric, then re-rank with your function). Mathematical properties of a metric space underpin why L2, L1, angular, and Hamming are safe choices for graph and tree indexes.

Frequently Asked Questions

What is the triangle inequality?

d(x, z) ≤ d(x, y) + d(y, z). The direct distance between two points is at most the distance via any third point. Used by ANN to prune search.

Is cosine similarity a metric?

No. It’s a similarity (higher = closer). 1 − cosine is not a metric either (can violate triangle inequality). Angular distance (angle in radians) is a metric.

Why do ANN indexes care about metrics?

Graph and tree indexes use the triangle inequality to bound distances and skip branches. Non-metric “distances” can break those guarantees and hurt recall or correctness.

Is Hamming distance a metric?

Yes. Non-negative, zero iff equal, symmetric, and satisfies the triangle inequality. See Hamming distance for binary vectors.