Similarity Metrics (Mathematical Foundations) · Topic 44

Manhattan Distance (L1).

Manhattan distance (L1 norm) between two vectors is the sum of absolute differences along each dimension: L1(a, b) = Σ |aᵢ − bᵢ|. It is the distance you would travel along a grid (like city blocks), as opposed to the straight-line Euclidean (L2) distance. L1 is a special case of Minkowski distance with exponent 1.

Summary

L1 is less sensitive to large outliers in one dimension than L2 (no squaring). In high dimensions, nearest-neighbor structure can differ from L2.
Cheap to compute (abs + add, no multiply/sqrt); L2 and inner product often get more SIMD optimization. Use L1 when domain or literature favors it (e.g. robust statistics, sparse data).
L1 is a true metric (triangle inequality holds); it is Minkowski distance with exponent p = 1.
Pipeline: choose L1 as the collection metric when robustness to outliers or sparse/count data matters; check that your VDB and index support L1.
Not all ANN indexes support L1; graph and IVF-style indexes may offer it as an option—verify in your engine’s docs.

Comparison with L2

L1 tends to be less sensitive to large outliers in a single dimension than L2, because it doesn’t square the differences. In high-dimensional spaces it behaves differently from L2: “corners” of the L1 ball are more prominent, and nearest-neighbor structure can change.

Some applications (e.g. sparse signals, certain retrieval tasks) use L1 by design; many vector DBs and ANN libraries support it as an option alongside L2 and cosine. Trade-off: L1 can give different nearest neighbors than L2, especially when a few dimensions have large deviations; choose L1 when that robustness is desired (e.g. noisy or high-dynamic-range features).

Computation and when to use L1

Computationally, L1 is cheap (absolute value and add, no multiply or sqrt), though in practice L2 and inner product often get more hardware optimization (SIMD). When choosing a metric, use L1 if your problem or domain literature favors it (e.g. robust statistics, sparse data); otherwise L2 and cosine remain the most common for general-purpose nearest neighbor search.

Practical tip: if your embeddings are sparse or count-based, or if you have known outlier dimensions, try L1 and compare recall vs. L2 on a small eval set. For how L1 compares to L2 and other metrics in CPU cost, see computation overhead of different distance metrics.

Pipeline and indexing

Typical pipeline: embed or featurize data, create a collection with L1 as the distance metric (if supported), and run k-NN or ANN search. Not all indexes support L1; graph-based indexes (e.g. HNSW) and IVF-style indexes may offer L1 as an option. Verify in your engine’s docs—support is less universal than for L2 and cosine/IP.

Trade-off: L1 can improve robustness and recall on noisy or sparse data, but at the cost of less ecosystem support and sometimes different recall–latency behavior than L2. Impact of distance metrics on recall can vary; run recall@k on a labeled set with both L1 and L2 and choose the metric that matches your notion of similarity. Thresholding for “good” matches uses L1 distance (smaller = closer); see thresholding and normalized vs. unnormalized distance scores when interpreting scores.

Frequently Asked Questions

Is L1 a true metric?

Yes. L1 satisfies non-negativity, identity, symmetry, and the triangle inequality; see mathematical properties of a metric space.

When would I prefer L1 over L2?

When you want robustness to outliers (one dimension with a huge difference doesn’t dominate) or when your domain (e.g. sparse or count data) traditionally uses L1.

Do vector DBs support L1?

Many do as a metric option per collection. Check your engine’s docs; support is less universal than for L2 and cosine/IP.

How does L1 relate to Minkowski distance?

L1 is Minkowski with p = 1; L2 is p = 2. Minkowski distance defines the general family (L1, L2, and other p-norms).