Indexing Algorithms - IVF & Quantization · Topic 102

Comparing Graph-based vs. Cluster-based indexes

Graph-based indexes (e.g. HNSW) represent vectors as nodes and connect neighbors with edges; search follows edges from an entry point. Cluster-based indexes (e.g. IVF) assign vectors to cells (e.g. Voronoi cells) and search only nprobe cells. Both achieve search space reduction and good recall-latency trade-offs, but they differ in structure, build cost, update friendliness, and how they scale on GPU or disk. This topic compares both families and when to choose or combine them.

Summary

Graph: high recall at low latency, good for point lookups and incremental add; higher memory per vector (links). Cluster: good batch throughput, easy to combine with IVFPQ, simpler to shard and run on disk/GPU.
Graph search is a greedy walk over neighbor edges; cluster search probes nprobe cells and scans their vectors (often with PQ).
Trade-offs: graph favors single-query latency and inserts; cluster favors batch throughput, quantization, and predictable memory.
Hybrid designs exist: e.g. IVF to select clusters, then HNSW or flat within; or HNSW for hot path and IVFPQ for cold/batch.
Practical tip: benchmark both on your data size and dimension; choose graph for low-latency OLTP-style workloads, cluster for bulk and GPU/disk.

Graph-based (HNSW)

Search is a greedy walk: start at an entry point, follow edges to closer neighbors until no improvement. Parameters like efSearch control how many candidates are considered per layer. The graph has no global training step: each insert connects to nearest neighbors in the structure, so incremental updates are natural.

Pros: often very high recall at low latency for single queries, incremental inserts, no need to retrain. Cons: memory for edges (e.g. M links per node in HNSW), build can be slower than IVF, and random access pattern can be harder to optimize for disk or batch GPU. Pipeline: embed query → enter graph at entry point → greedy walk (expand candidates with efSearch) → return top-k. Good for OLTP-style, low-latency serving.

Cluster-based (IVF)

Vectors belong to clusters (e.g. K-means); search probes nprobe nearest clusters and scans only those vectors (often with PQ). Build requires a training phase to compute centroids (nlist); then each vector is assigned to its nearest centroid. Query pipeline: embed query → find nprobe nearest centroids → scan only those lists (with optional PQ decode) → return top-k.

Pros: predictable memory (no per-node links), excellent batch and GPU utilization, straightforward to combine with quantization and to shard by cluster. Cons: need to choose and tune nlist/nprobe, full retrain or careful update for new data, and recall can plateau unless you probe more cells. Better for batch indexing and throughput-oriented workloads.

Trade-offs at a glance

Recall vs. latency: HNSW often gives higher recall at low latency on a single machine; IVF can match or exceed with higher nprobe and careful PQ, and scales well with batch size. Memory: Graph stores edges (e.g. 2×M×4 bytes per node); IVF stores list pointers and optional PQ codebooks. Updates: Graph supports incremental add (and with care, delete); IVF usually needs periodic retrain or overflow handling. Disk/GPU: IVF is often easier on disk (sequential scan of a few lists); graph (e.g. Vamana/Disk-ANN) uses careful layout and caching—both can work.

Choosing and combining

Use graph when you need single-query latency and incremental updates; use cluster (e.g. IVFPQ) when you need batch throughput, very large scale, or disk/GPU-friendly access. Some systems offer both and let you pick by collection or use hybrid designs (e.g. IVF first, then HNSW within cells).

Practical tip: run recall-latency curves for both on your dataset and dimension; if you have heavy batch ingestion or GPU/disk constraints, lean cluster; if you have real-time ingestion and low-latency queries, lean graph. You can also use IVF to select shards or clusters and HNSW or flat search within each for a hybrid.

Frequently Asked Questions

Which has better recall for the same latency?

Often HNSW gives higher recall at low latency on a single machine; IVF can match or exceed if you tune nprobe and use PQ carefully, and scales well with batch size.

Can I use both in one system?

Yes. Some DBs use IVF to select shards or clusters, then HNSW or flat search within; or use HNSW for hot path and IVFPQ for cold/batch.

Which is better for disk (SSD)?

IVF is often easier: sequential scan of a few lists. Graph (e.g. Vamana/Disk-ANN) is designed for disk with careful layout and caching; both can work.

Which uses more memory?

Graph indexes store edges (e.g. 2×M×4 bytes per node for HNSW), so often more per vector than IVF lists plus PQ codes. Exact ratio depends on nlist, M, and dimension.