Similarity Metrics (Mathematical Foundations) · Topic 49

Minkowski Distance.

Minkowski distance is a family of distance functions parameterized by p ≥ 1: D_p(a, b) = (Σ |aᵢ − bᵢ|^p)^(1/p). For p = 1 it is Manhattan (L1); for p = 2 it is Euclidean (L2). As p increases, the distance becomes more dominated by the largest coordinate difference; as p → ∞ it tends to the maximum of |aᵢ − bᵢ| (Chebyshev distance).

Summary

Satisfies metric properties for p ≥ 1. In vector DBs and ANN, L1 and L2 are most common; other p in research or domain settings.
Smaller p (e.g. L1) spreads influence; larger p emphasizes the dimension with the biggest difference. For most workloads, L2 or cosine (and optionally L1) are standard.
As p → ∞, Minkowski tends to Chebyshev (max of coordinate differences). For p < 1, the triangle inequality fails.
Pipeline: choose L1 or L2 per collection; general p is rarely supported in production VDBs—see custom distance functions if you need non-standard metrics.
Trade-off: L1 more robust to outliers; L2 better optimized in hardware and indexes; general p rarely worth the implementation cost.

Metric properties and common cases

Minkowski distance satisfies the properties of a metric (non-negativity, identity, symmetry, triangle inequality) for p ≥ 1. In vector databases and ANN, L1 and L2 are by far the most common; other p values are used in some research or domain-specific settings (e.g. robust statistics, image retrieval).

Support for general Minkowski in production VDBs is less common than for L2 and cosine. Practical tip: unless your domain explicitly uses a specific p (e.g. L1 for robust retrieval), default to L2 or cosine for nearest neighbor search and compare recall before adopting L1 or other p.

Choosing p

Choosing p trades off sensitivity to outliers: smaller p (e.g. L1) spreads influence across dimensions; larger p emphasizes the dimension with the biggest difference. For most embedding-based nearest neighbor workloads, sticking with L2 or cosine (and optionally L1 if the use case demands it) is standard.

Trade-off: L1 is more robust to a single dimension dominating; L2 is the “natural” Euclidean geometry and is heavily optimized in SIMD and index code. As p grows, the distance approaches the maximum coordinate difference (Chebyshev), useful in grid or game-like spaces. See when to use L2 vs. cosine for the common L2/cosine choice.

Pipeline and VDB support

Typical pipeline: pick L1 or L2 when creating the collection; most engines do not support arbitrary p. If you need a custom Minkowski p, check whether the engine supports custom distance functions or pre-transform your vectors so that L2 in the transformed space matches the desired behavior.

Practical tip: default to L2 (or cosine for text) unless you have a clear reason for L1 (e.g. sparse or count data, robustness to outliers). Computation overhead and hardware acceleration favor L2 and dot product; impact of distance metrics on recall can differ slightly between L1 and L2—validate on your data before committing.

Frequently Asked Questions

What is Chebyshev distance?

Limit of Minkowski as p → ∞: the maximum of |aᵢ − bᵢ| over dimensions. Used in some grid or game-based applications.

Do vector DBs support arbitrary p?

Most support L1 and L2; general Minkowski is rarer. Check your engine; custom distance functions covers non-standard metrics.

Why is L2 more common than other p?

L2 matches “natural” Euclidean geometry, is well optimized (SIMD, indexes), and many embedding models assume it or cosine. See when to use L2 vs. cosine.

Is Minkowski defined for p < 1?

For p < 1 the formula doesn’t satisfy the triangle inequality, so it’s not a metric. p ≥ 1 is standard.