Similarity Metrics (Mathematical Foundations) · Topic 43

Inner Product (Dot Product) explained.

The inner product (or dot product) of two vectors a and b is a · b = Σ aᵢ bᵢ—the sum of products of corresponding components. It measures alignment and magnitude together: large positive values mean same direction and large norms; negative means opposite direction. Many vector databases use the (negative) inner product as a distance so that “nearest” means maximum dot product.

Summary

For normalized vectors, dot product equals cosine similarity; ranking by dot product = ranking by cosine.
Dot product is one fused multiply-add loop—fast and SIMD-friendly; see dot product and cosine on normalized vectors.
When vectors are not normalized, dot product is sensitive to both direction and length; ANN libraries often offer “IP” or “cosine” (cosine via normalized + IP).
As a distance, use negative inner product so “minimize distance” matches “maximize similarity”; indexes then return k nearest by −(a·b).
Pipeline: choose IP or cosine per collection; if cosine, normalize at ingestion and query so the index can use IP under the hood.

Equivalence to cosine for unit vectors

For normalized vectors (unit length), the dot product equals cosine similarity: a · b = cos(a, b) when ‖a‖ = ‖b‖ = 1. So indexing with normalized vectors and ranking by dot product is equivalent to ranking by cosine—and dot product is a single fused multiply-add loop, which is very fast and SIMD-friendly.

Many vector DBs implement “cosine” search by storing normalized vectors and using IP for comparisons. Practical tip: when creating a collection, pick either “cosine” or “IP” and be consistent; if you want cosine semantics, normalize before upsert and before each query so the engine can use IP internally. See hardware acceleration (SIMD) for distance calculations for how this is exploited in practice.

The mathematical relationship is exact: for unit vectors, cos(a,b) = a·b. So any index that supports inner product and receives normalized vectors effectively supports cosine similarity with no extra computation. This is why “cosine” and “IP” are often the same code path in vector DBs; the only difference is whether the system normalizes for you or expects pre-normalized input.

When vectors are not normalized

When vectors are not normalized, dot product is sensitive to both direction and length. That can be desirable (e.g. when magnitude encodes importance or confidence) or undesirable. ANN libraries and vector DBs often offer “IP” (inner product) or “cosine” as metric options; under the hood, cosine is usually implemented by storing normalized vectors and using IP for comparisons.

Trade-off: unnormalized IP lets magnitude influence ranking, which can help when length is meaningful (e.g. relevance scores). When length is noise (e.g. batch-dependent scale), normalize so that dot product matches cosine and ranking reflects direction only. For how to interpret and threshold scores in each case, see normalized vs. unnormalized distance scores.

Practical tip: if your embedding model outputs unnormalized vectors and you care about direction more than magnitude, normalize once at ingestion and at query time and use IP (or “cosine”) in the index. If the model outputs unit vectors by default, you can use IP directly without extra normalization; check the model docs to avoid double-normalizing or inconsistent scaling.

Pipeline and index configuration

Typical pipeline: embed with your model, decide whether to normalize (if you want cosine, yes). Create a collection with distance type “IP” or “cosine”; if cosine, the DB typically expects unit-length vectors. Upsert document vectors and at query time pass the query vector in the same format. Returned scores are either raw dot product (for IP) or cosine-equivalent (when normalized); use them for thresholding or re-ranking.

Hardware acceleration (SIMD) applies well to dot product: one loop over dimensions with multiply-adds. Many engines optimize this path; L2 and L1 may use different code paths. When in doubt, prefer cosine (normalized + IP) for text semantics and IP or L2 for other cases where magnitude matters.

Index build and query both use the same distance type; switching from IP to L2 (or vice versa) requires re-indexing. Impact of distance metrics on recall can differ slightly between IP and L2 for unnormalized data; for normalized data they give the same k-nearest neighbors. Define a good match score using thresholding on the raw dot product or on a derived distance (e.g. 1 − IP for unit vectors) depending on your collection configuration.

Frequently Asked Questions

Why use negative inner product as distance?

So that “minimize distance” matches “maximize similarity”: smaller (negative) dot product ⇒ larger positive dot product ⇒ more similar. Indexes often minimize distance, so they use −(a·b).

Can dot product be negative?

Yes. Negative dot product means vectors point in opposite directions (angle > 90°). For embeddings that can be negative, this is normal; see handling negative values in vector components.

Do I need to normalize before using dot product?

Only if you want cosine semantics (direction only). If magnitude is meaningful, use unnormalized vectors and dot product (or L2) as chosen for your collection.

How does dot product compare to L2 for ANN?

For normalized vectors, nearest under L2 is the same as nearest under (negative) dot product. For unnormalized, order can differ; choose based on whether magnitude should affect ranking. See when to use L2 vs. cosine similarity.