Performance, Evaluation & Benchmarking · Topic 169

The Recall-Latency Trade-off curve

The recall-latency trade-off describes the inverse relationship between search quality (recall@K) and query latency: to get higher recall, an ANN index typically does more work (e.g. more graph hops, more list scans), which increases response time.

Summary

The recall-latency trade-off: inverse relationship between recall@K and query latency—higher recall typically requires more work (more graph hops, list scans), so higher response time.
Varying index params (e.g. HNSW ef, IVF nprobe) gives a curve: low latency → lower recall; high latency → recall approaches 1. Curve is index- and dataset-dependent. Benchmarks plot recall vs. latency or QPS so you pick an operating point (e.g. 95% recall at 5 ms).
In production choose a point for your app (e.g. 90% recall <20 ms for real-time; 99% recall for batch). See HNSW trade-off and ANN-Benchmarks. Pipeline: vary ef/nprobe, measure recall and latency, plot curve. Practical tip: target 90–95% recall for many apps; tune on a representative workload.

The curve and parameters

By varying index parameters (e.g. HNSW ef, IVF nprobe), you get a curve: at low latency, recall is lower; at high latency, recall approaches 1 (or the brute-force baseline). The curve is index- and dataset-dependent. Benchmarks plot recall (e.g. recall@10) on the y-axis and latency or QPS on the x-axis so you can pick an operating point (e.g. “95% recall at 5 ms”).

Choosing an operating point

In production, you choose a point that meets your application’s requirements: a recommendation engine might accept 90% recall with <20 ms for real-time UX, while a batch pipeline might allow 99% recall with higher latency. The trade-off in HNSW and similar indexes is controlled by these parameters; the curve helps compare different algorithms (e.g. HNSW vs. IVF-PQ) on the same dataset in tools like ANN-Benchmarks. Pipeline: vary ef/nprobe, measure recall and latency, plot curve. Practical tip: target 90–95% recall for many apps; tune on a representative workload.

Frequently Asked Questions

What is the recall-latency trade-off?

Inverse relationship: to get higher recall@K, an ANN index typically does more work (e.g. more graph hops, more list scans), which increases response time. So higher recall usually means higher latency for the same hardware.

How do I get a recall-latency curve?

Vary index parameters (e.g. HNSW ef, IVF nprobe). At low latency, recall is lower; at high latency, recall approaches 1. Benchmarks plot recall (e.g. recall@10) vs. latency or QPS so you can pick an operating point. Curve is index- and dataset-dependent.

What operating point should I choose?

Depends on your application: recommendation engine might accept 90% recall with <20 ms for real-time UX; batch pipeline might allow 99% recall with higher latency. See trade-off in HNSW and ANN-Benchmarks to compare algorithms.

How do I compare HNSW vs. IVF-PQ on this curve?

Run both on the same dataset with varying parameters; plot recall vs. latency or QPS. ANN-Benchmarks and similar tools do this. The curve helps you choose algorithm and parameters for your latency and recall requirements.