Performance, Evaluation & Benchmarking · Topic 177

Benchmarking different distance metrics

Different distance metrics (L2, cosine, dot product, etc.) have different computational cost and hardware behavior. Benchmarking them on your data and hardware ensures you compare recall and latency fairly when choosing a metric.

Summary

Metrics differ in cost: for normalized vectors cosine = dot product (often faster); L2 needs squared differences and optionally sqrt. See computation overhead and impact on recall.
SIMD/GPU may favor one metric (e.g. dot product is very GPU-friendly). Fix index type and parameters, then run the same workload per metric and measure recall@k and latency.
Use a relevance set (ground truth) for retrieval quality; let your embedding model and L2 vs. cosine guide metric choice, then validate with benchmarks. Pipeline: fix index and params, run same workload per metric, measure recall and latency. Practical tip: match metric to your embedding (e.g. normalized → dot product).

Computational and hardware differences

For normalized vectors, cosine similarity is equivalent to dot product, which is often faster (no sqrt, one pass). L2 requires squared differences and a square root (or you can compare squared L2 to avoid the sqrt). See computation overhead of different distance metrics and impact of distance metrics on recall. SIMD and GPU implementations may favor one metric over another; e.g. dot product is very GPU-friendly.

How to benchmark fairly

When benchmarking: fix the index type and parameters, then run the same workload with each metric and measure recall@k and latency. Use a relevance set (ground truth) if you care about retrieval quality, not just speed. Your embedding model and when to use L2 vs. cosine should drive the metric choice; benchmarks then validate the performance of that choice on your stack.

Pipeline: fix index and params, run same workload per metric, measure recall and latency. Practical tip: match metric to your embedding (e.g. normalized → dot product).

Frequently Asked Questions

Why is dot product often faster than L2?

For normalized vectors, dot product equals cosine similarity and is a single pass (multiply-accumulate); no square root. L2 needs squared differences per dimension and a sqrt (or you compare squared L2). Dot product is also very SIMD and GPU friendly.

What should I keep fixed when benchmarking metrics?

Keep index type (e.g. HNSW, IVF), index parameters, dataset, and query set the same. Only change the distance metric. Then measure recall@k and latency (e.g. p50, p99) so the comparison is fair.

Should I choose the metric by benchmark speed alone?

No. Your embedding model and use case (e.g. L2 vs. cosine) should drive the metric; benchmarks validate that the chosen metric meets your latency and recall SLOs on your hardware.

What is a relevance set (ground truth) for benchmarking?

A set of (query, relevant document IDs) pairs—either labeled data or derived from clicks/ratings. You run retrieval with each metric and compute recall@k against this set to compare retrieval quality, not just raw speed.