Basic Fundamentals · Topic 16

Exact vs. Approximate Nearest Neighbor (ANN) search

Exact nearest-neighbor search compares the query vector to every point in the collection and returns the true k nearest. It’s correct but O(n) per query, so it doesn’t scale. Approximate nearest neighbor (ANN) uses an index (e.g. HNSW, IVF) to return neighbors that are “close enough” with high probability, in sublinear time.

Summary

Exact k-NN: compare query to every vector; O(n); correct but doesn’t scale.
ANN: use an index (HNSW, IVF, etc.) to return near-optimal neighbors in sublinear time; trade a small loss in recall for latency and throughput.
Parameters (e.g. efSearch, nprobe) tune the recall–latency trade-off; 95–99% recall is often acceptable for semantic search and recommendations.
Vector databases typically use ANN for the main query path; the curse of dimensionality makes exact search impractical at scale.

Exact k-NN: correct but expensive

Exact k-nearest neighbor search computes the distance (or similarity) between the query vector and every point in the collection, then returns the k with smallest distance (or largest similarity). That guarantees the true k nearest neighbors but requires O(n) distance computations per query. For millions or billions of vectors, that’s too slow for interactive use and doesn’t scale with throughput. Exact search is still used for small collections, for validation (e.g. to measure recall of an ANN index), or when correctness is non-negotiable.

Brute-force exact search can be parallelized (e.g. over vector chunks or GPU threads) to speed up single queries, but the total work remains O(n). For very small n (e.g. under 10k vectors), some systems skip building an index and use exact search by default to avoid index build cost and parameter tuning.

ANN: approximate but scalable

Approximate nearest neighbor (ANN) uses an index structure to avoid scanning every vector. Algorithms like HNSW (graph-based) or IVF (cluster-based) organize vectors so that the search can “navigate” to likely neighbors and evaluate only a fraction of the dataset. The result is approximate: you get neighbors that are “close enough” with high probability, not always the exact top k. ANN algorithms trade a small loss in recall for large gains in latency and throughput. You tune parameters (e.g. HNSW’s efSearch, IVF’s nprobe) to balance speed and accuracy. For most semantic search and recommendation use cases, 95–99% recall is acceptable and ANN is the only practical option at scale.

ANN indexes are built once (or updated incrementally) and then serve many queries. Build time and memory footprint depend on the algorithm and parameters; at query time, higher efSearch or nprobe usually improves recall but increases latency. Benchmarking against exact k-NN on a sample lets you choose parameters that meet your recall target.

Why exact search doesn’t scale in high dimensions

The curse of dimensionality means that in high dimensions, exact search can’t prune the space efficiently with simple structures (e.g. B-trees). There’s no one-dimensional order that corresponds to “nearest,” so you can’t avoid checking a large fraction of points. ANN indexes exploit graph or clustering structure to navigate to likely neighbors without scanning everything. Vector databases typically use ANN under the hood for the main query path. See the recall–latency trade-off curve and recall–latency trade-off in HNSW for tuning.

Frequently Asked Questions

When should I use exact search instead of ANN?

Use exact when the collection is small (e.g. < 100k vectors), when you need 100% recall (e.g. compliance), or when you’re benchmarking ANN recall. For production at scale, ANN is the default.

What recall can I expect from ANN?

With tuned parameters, 95–99% recall@k is common. It depends on index type, efSearch / nprobe, and data. Measure with a ground-truth exact set; see measuring recall at k and ANN benchmarks.

Does ANN work with metadata filtering?

Yes. Systems combine ANN with metadata filtering via pre- or post-filtering. Pre-filtering can reduce the candidate set before ANN; post-filtering applies filters after ANN. Both can affect recall and latency.

Can I switch from exact to ANN without re-ingesting?

You build an ANN index over the same vectors. Many VDBs build the index on ingestion; if you started with a flat (exact) index, you may need to rebuild or create a new collection with an ANN index type. See your product’s docs for index types and dynamic indexing.