Similarity Metrics (Mathematical Foundations) · Topic 57

Normalized vs. Unnormalized distance scores

Normalized scores lie in a known, bounded range (e.g. cosine similarity in [−1, 1], or angular distance in [0, π]). Unnormalized scores (e.g. raw dot product or L2 distance) can grow with vector dimension and magnitude, so their scale depends on your data and embedding model.

Summary

Normalized scores make thresholding easier (“cosine > 0.8” is portable). Unnormalized L2/dot product vary with dimension and normalization—document which you use.
Ranking is often what matters: order is the same for cosine and dot product when vectors are pre-normalized. APIs may rescale to 0–1 for display.
Cosine is in [−1, 1]; L2 is unbounded above. Document score type and range in APIs and config so thresholds are interpretable.
Pipeline: normalize at ingest/query for bounded scores, or document “raw” metric and scale so downstream can set thresholds correctly.
Trade-off: bounded scores (cosine) simplify thresholding; raw L2/dot product can be faster but need per-model threshold tuning.

Why normalized scores help

Normalized scores make thresholding and interpretation easier: “cosine > 0.8” is a portable rule of thumb across many setups, even if the exact best threshold still depends on the task. Unnormalized L2 or dot product can vary with embedding dimension, normalization of the vectors, and model; a “good” threshold for 384-d vectors may not work for 768-d or for unnormalized outputs.

So when you expose scores to users or downstream systems, documenting whether they are normalized (and which metric) avoids confusion. Practical tip: if your API returns “similarity” in 0–1, document whether that is rescaled cosine, dot product, or something else so that thresholding and A/B tests use comparable cutoffs.

Ranking and API presentation

For vector databases, ranking is often what matters: the order of neighbors is the same for cosine and dot product when vectors are pre-normalized. So internally the DB may store normalized vectors and return dot product (unnormalized in the sense of not dividing by norms again) for speed, while the API might present a “similarity” in a 0–1 range by rescaling.

Understanding whether your scores are normalized or not helps you set thresholds and compare across queries or collections. Trade-off: normalized scores (e.g. cosine) are bounded and portable; unnormalized L2 or dot product can be faster to compute (no extra norm) but require per-model or per-collection threshold tuning. Thresholding and how to define a good match score guide choosing cutoffs.

Pipeline summary: at ingest and query, either normalize (for bounded cosine-like scores) or document the raw metric and typical scale. Expose score type and range in API docs so clients can set thresholds; when changing embedding model or dimension, recalibrate. Impact of distance metrics on recall is about retrieval completeness; normalized vs. unnormalized is about score interpretation—both affect end quality.

Frequently Asked Questions

What is a normalized score?

One in a fixed range (e.g. cosine in [−1, 1], or 0–1 after rescaling). Same scale across queries and collections for that metric.

Why does my dot product vary so much across queries?

Dot product depends on vector norms and dimension. If vectors aren’t normalized, scores scale with magnitude; normalize for stable scale or use cosine.

Can the DB return “similarity” when it uses dot product?

Yes. It can store normalized vectors, compute dot product internally, and map the result to a 0–1 “similarity” in the API for consistency.

Do I need to normalize for thresholding?

Not strictly, but normalized scores (e.g. cosine) make it easier to pick and reuse thresholds across models and dimensions.