Measuring Recall@K
Recall@K is the fraction of the true top-K nearest neighbors (according to the chosen distance metric) that are actually returned by the approximate index. It measures how accurate the ANN search is when you ask for K results.
Summary
- Recall@K is the fraction of the true top-K nearest neighbors (by the chosen distance metric) that are actually returned by the approximate index. Measures ANN search accuracy when you ask for K results; always between 0 and 1, higher is better.
- Compute: for each query get ground truth top-K (brute-force or exact index); run ANN, get result set R; Recall@K = |R ∩ GroundTruth_K| / K. Average over many queries. ANN indexes trade recall vs. latency (e.g. higher
efin HNSW improves recall but increases query time). - Benchmarks (e.g. ANN-Benchmarks) report recall@K at different latency/throughput; see recall-latency trade-off. Pipeline: get ground truth, run ANN, intersect with R, divide by K. Practical tip: use a held-out query set and fix K (e.g. 10) for comparable numbers.
How to compute Recall@K
To compute it: for each query, obtain the ground truth top-K (e.g. via brute-force exact search on a small dataset, or a trusted exact index). Run the ANN search and get the result set R. Recall@K = |R ∩ GroundTruth_K| / K. Average over many queries for a single number (e.g. “Recall@10 = 0.95” means 95% of the true top-10 are in the returned set). Recall@K is always between 0 and 1; higher is better.
Recall vs. latency and benchmarks
ANN indexes trade off recall vs. latency: increasing ef in HNSW or probing more clusters in IVF usually improves recall but increases query time. Benchmarks (e.g. ANN-Benchmarks) report recall@K curves at different latency or throughput points so you can choose an operating point that fits your recall-latency trade-off. Pipeline: get ground truth, run ANN, intersect with R, divide by K. Practical tip: use a held-out query set and fix K (e.g. 10) for comparable numbers.
Frequently Asked Questions
What is Recall@K?
The fraction of the true top-K nearest neighbors (by the chosen distance metric) that are actually returned by the approximate index. Measures how accurate ANN search is when you ask for K results. Always between 0 and 1; higher is better.
How do I compute it?
For each query, obtain ground truth top-K (e.g. brute-force exact search). Run ANN and get result set R. Recall@K = |R ∩ GroundTruth_K| / K. Average over many queries for a single number (e.g. Recall@10 = 0.95 means 95% of true top-10 are in the returned set).
How does recall relate to latency?
ANN indexes trade recall vs. latency: increasing ef in HNSW or probing more clusters in IVF usually improves recall but increases query time. See recall-latency trade-off curve.
Where do I find recall benchmarks?
Benchmarks like ANN-Benchmarks report recall@K curves at different latency or throughput so you can choose an operating point. Compare algorithms (e.g. HNSW vs. IVF-PQ) on the same dataset.