Performance, Evaluation & Benchmarking · Topic 166

Measuring Recall@K

Recall@K is the fraction of the true top-K nearest neighbors (according to the chosen distance metric) that are actually returned by the approximate index. It measures how accurate the ANN search is when you ask for K results.

Summary

Recall@K is the fraction of the true top-K nearest neighbors (by the chosen distance metric) that are actually returned by the approximate index. Measures ANN search accuracy when you ask for K results; always between 0 and 1, higher is better.
Compute: for each query get ground truth top-K (brute-force or exact index); run ANN, get result set R; Recall@K = |R ∩ GroundTruth_K| / K. Average over many queries. ANN indexes trade recall vs. latency (e.g. higher ef in HNSW improves recall but increases query time).
Benchmarks (e.g. ANN-Benchmarks) report recall@K at different latency/throughput; see recall-latency trade-off. Pipeline: get ground truth, run ANN, intersect with R, divide by K. Practical tip: use a held-out query set and fix K (e.g. 10) for comparable numbers.

How to compute Recall@K

To compute it: for each query, obtain the ground truth top-K (e.g. via brute-force exact search on a small dataset, or a trusted exact index). Run the ANN search and get the result set R. Recall@K = |R ∩ GroundTruth_K| / K. Average over many queries for a single number (e.g. “Recall@10 = 0.95” means 95% of the true top-10 are in the returned set). Recall@K is always between 0 and 1; higher is better.

Recall vs. latency and benchmarks

ANN indexes trade off recall vs. latency: increasing ef in HNSW or probing more clusters in IVF usually improves recall but increases query time. Benchmarks (e.g. ANN-Benchmarks) report recall@K curves at different latency or throughput points so you can choose an operating point that fits your recall-latency trade-off. Pipeline: get ground truth, run ANN, intersect with R, divide by K. Practical tip: use a held-out query set and fix K (e.g. 10) for comparable numbers.

Frequently Asked Questions

What is Recall@K?

The fraction of the true top-K nearest neighbors (by the chosen distance metric) that are actually returned by the approximate index. Measures how accurate ANN search is when you ask for K results. Always between 0 and 1; higher is better.

How do I compute it?

For each query, obtain ground truth top-K (e.g. brute-force exact search). Run ANN and get result set R. Recall@K = |R ∩ GroundTruth_K| / K. Average over many queries for a single number (e.g. Recall@10 = 0.95 means 95% of true top-10 are in the returned set).

How does recall relate to latency?

ANN indexes trade recall vs. latency: increasing ef in HNSW or probing more clusters in IVF usually improves recall but increases query time. See recall-latency trade-off curve.

Where do I find recall benchmarks?

Benchmarks like ANN-Benchmarks report recall@K curves at different latency or throughput so you can choose an operating point. Compare algorithms (e.g. HNSW vs. IVF-PQ) on the same dataset.