Similarity Metrics (Mathematical Foundations) · Topic 53

Computation overhead of different distance metrics

Different similarity and distance metrics have different per-vector cost. For high-dimensional vectors and billions of comparisons, the choice of metric can materially affect latency and throughput in a vector database.

Summary

Dot product is cheapest (one multiply-accumulate per dimension). Cosine adds norms and division unless vectors are pre-normalized (then = dot product). L2 needs squared differences, sum, and sqrt. L1 is abs + add. See SIMD for speedups.
Many systems store normalized vectors and use dot product for ranking; balance recall and semantics with cost.
L2² avoids the square root and preserves neighbor order; many indexes use L2² internally. Cost scales linearly in dimension d for all common metrics.
Pipeline: prefer normalized + dot product (cosine) for lowest cost when direction-only is sufficient; use L2 or L2² when magnitude matters and SIMD is available.
Trade-off: dot product is fastest; L2/L1 slightly heavier; Mahalanobis and custom metrics can dominate at high d. Quantization (e.g. INT8) further reduces memory and can speed dot product.

Per-metric cost

Dot product (inner product) is cheapest: one multiply-accumulate per dimension, no square roots or divisions. Cosine similarity adds two norms (or uses pre-normalized vectors so it reduces to dot product) and a division. Euclidean (L2) distance requires squared differences, sum, and a square root—more operations and often slower. Manhattan (L1) uses absolute differences and sum, typically between dot product and L2 in cost.

Metrics that need extra data (e.g. Mahalanobis with a covariance matrix) add matrix ops and can be significantly heavier. Practical tip: when your embedding model outputs normalized vectors, use a “cosine” or “IP” index so the engine does a single dot-product loop; avoid computing norms at query time. See scalar quantization (SQ) for reducing memory and speeding dot product with INT8.

Practice and quantization

In practice, many systems optimize by storing normalized vectors and using dot product for ranking, which is equivalent to cosine and benefits from SIMD and GPU kernels. Scalar quantization (e.g. INT8) can reduce memory and speed up dot product further; scalar quantization (SQ): float32 to INT8 covers details.

Pipeline summary: at collection creation, choose a metric that matches your embedding model and use case; prefer cosine (normalized + dot product) when direction-only is sufficient. Measure latency and throughput on your hardware; impact of CPU architecture and GPU vs. CPU for query serving can shift the balance. Normalized vs. unnormalized distance scores affect how you interpret returned values but not the per-comparison cost once the metric is fixed.

When choosing a metric, balance recall and semantics with these implementation costs, especially at scale. Trade-off: dot product is fastest and most SIMD-friendly; L2 and L1 are slightly heavier but still linear in d; custom or matrix-based metrics (e.g. Mahalanobis) can dominate cost at high dimensions.

Frequently Asked Questions

Which metric is fastest in practice?

Dot product on pre-normalized vectors (cosine equivalent). Single loop, no sqrt or division; best SIMD/GPU utilization.

Does L2² avoid the square root?

Yes. Squared L2 preserves neighbor order; many indexes use L2² and only take sqrt when returning a human-readable distance.

How does dimension affect cost?

All scale linearly in dimension d. Dot product: d multiply-adds; L2: d subtracts, squares, sum, sqrt. Higher d amplifies the difference.

What about Mahalanobis?

Requires d×d covariance (or its inverse) and matrix-vector ops—much heavier than L2 or dot product. See Mahalanobis distance.