Indexing Algorithms - IVF & Quantization · Topic 99

Search space reduction techniques

Search space reduction is the idea that, instead of comparing the query to every vector (brute force), an index limits the set of candidates. Examples: IVF restricts search to nprobe clusters; HNSW follows graph edges so only a fraction of nodes are visited; LSH only searches within matching hash buckets; tree-based indexes prune branches. The goal is to keep recall high while cutting the number of distance evaluations and thus latency and cost.

Summary

Clustering (IVF): only nprobe nearest clusters searched. Graph (HNSW): O(log n) or few hundred nodes visited. LSH: only matching hash buckets. Trees: prune branches. Compare how much each reduces candidates; tune nprobe, efSearch, etc.
See graph-based vs. cluster-based trade-offs.
Pipeline: query → index restricts candidates (IVF/HNSW/LSH/tree) → compute distances on candidate set (often quantized) → return top-k.
Often combined with quantization: reduce candidates first, then fast distance on PQ/SQ codes. Smaller search space usually means lower recall; tune to meet target.
Practical tip: compare IVF (nprobe), HNSW (efSearch), LSH (number of tables), and trees (search_k) on your data; combining IVF and PQ (IVFPQ) and tree-based indexing (Annoy) are concrete implementations.

Main approaches

Clustering (IVF): Only vectors in the nprobe nearest clusters are scored; the rest are never touched. With nlist=10k and nprobe=32, you search roughly 0.3% of clusters (and a variable fraction of vectors depending on cluster sizes). Graph (HNSW): Search follows edges from an entry point; typically O(log n) or a few hundred nodes are visited. LSH: Only buckets that the query hashes to are scanned. Trees: Branches that cannot contain a nearer neighbor than the current best are pruned. Each method has parameters (nprobe, efSearch, number of tables, tree depth) that trade off how much of the space is searched vs. recall.

Combining with quantization

Search space reduction is often combined with quantization: first reduce the set of candidates (IVF, HNSW, LSH), then compute distances on compressed representations (PQ, SQ) so that scanning the candidate set is fast and memory-efficient. Together they enable billion-scale ANN with acceptable latency and recall.

Pipeline and trade-offs

Pipeline: for each query, the index returns a candidate set (e.g. vectors in nprobe cells, or nodes reached by graph traversal); distances are computed on that set (optionally with PQ/SQ); top-k are returned. Trade-off: smaller search space cuts cost but can miss true neighbors; parameters (nprobe, efSearch, number of LSH tables, tree search_k) tune the recall–latency curve.

Practical tip: start with one technique (e.g. IVF or HNSW), measure recall and latency, then consider combining with quantization (IVFPQ) or adding re-ranking. Comparing graph-based vs. cluster-based indexes helps choose between HNSW and IVF.

Frequently Asked Questions

What fraction of vectors does IVF typically search?

Roughly (nprobe / nlist) × average cluster size factor. With balanced clusters, probing 1% of clusters means ~1% of vectors; often nprobe is tuned to search 1–10% of the dataset.

Does HNSW “reduce” search space?

Yes. Instead of scanning all n vectors, you traverse O(log n) or a few hundred nodes. The graph structure restricts which vectors are ever compared to the query.

Can I combine IVF and HNSW?

Some systems do: e.g. IVF to select clusters, then HNSW or flat search within those clusters. Or build one HNSW per cluster. Hybrids exist for specific scale and latency goals.

Why not always use the smallest search space?

Smaller search space usually means lower recall (you might miss true neighbors). You tune to the smallest space that still meets your recall target.