Combining IVF and PQ (IVFPQ)
IVFPQ combines an inverted file (IVF) index with product quantization (PQ): vectors are assigned to Voronoi cells (via nlist centroids), and within each cell vectors are stored as PQ codes. At query time, only nprobe cells are searched, and distances are approximated using PQ. This gives both search space reduction and low memory, making IVFPQ a workhorse for large-scale ANN in libraries like Faiss.
Summary
- Build: train IVF centroids, train PQ codebooks (on residuals from centroid or on raw vectors), assign vectors to cells, encode as PQ. Query: find nprobe nearest centroids, search only those lists with PQ distance. Optional OPQ before PQ. Choose vs. HNSW or GPU indexes by scale and latency.
- Pipeline: K-means → nlist centroids; train PQ codebooks (often on residuals); assign each vector to nearest centroid, encode as m PQ codes; store codes in inverted lists. Query: nprobe nearest centroids → scan only those lists with PQ distance tables.
- Trade-off: IVFPQ gives search space reduction (IVF) plus memory reduction (PQ); recall depends on nlist, nprobe, and PQ parameters (m, k). Good for billion-scale when build and approximate recall are acceptable.
- Typically no raw vectors stored—only PQ codes and centroid assignment; re-ranking may use centroid + PQ decode or a separate full-precision store.
- Practical tip: tune nlist and nprobe for recall vs. latency; use OPQ when PQ recall is insufficient. Search space reduction techniques and how PQ compresses vectors into codes describe the two main ingredients.
Build and search
At build time, you run K-means to get nlist IVF centroids, then train PQ codebooks (often on the residuals after subtracting the vector’s assigned centroid, which improves quality). Each vector is assigned to its nearest centroid and encoded as PQ codes; the inverted list for each cell stores these codes. At query time, you find the nprobe nearest centroids to the query, gather the PQ codes in those lists, and compute approximate distances (e.g. via precomputed tables from the query subvectors to each codebook centroid). You return the k vectors with smallest approximate distance.
When to use IVFPQ
IVFPQ is a good default for very large (e.g. 10M–1B+), memory- or disk-bound indexes when you can accept approximate recall and want efficient batch search. For lower latency and often higher recall at moderate scale, HNSW (possibly with quantization) is common. Many systems support both and let you choose or combine (e.g. HNSW for hot path, IVFPQ for batch or cold data).
Practical tip: train PQ codebooks on residuals (vector minus centroid) for better quality; measure recall@K and latency for your nprobe and PQ (m, k). What is IVF (inverted file index), the role of Voronoi cells in IVF, and trade-off between compression ratio and accuracy give more detail on components.
Frequently Asked Questions
Are PQ codebooks global or per-cluster?
Usually global: one set of codebooks for the whole index. Per-cluster codebooks (e.g. in Faiss’s IVFPQ with per-cell training) can improve accuracy at higher memory and build cost.
Do I store raw vectors in IVFPQ?
Typically no—only PQ codes and optionally the IVF centroid assignment. That’s how you get large memory savings. For re-ranking you might keep a separate full-precision store or use the centroid + PQ decode as an approximation.
How do I tune nlist and nprobe for IVFPQ?
nlist: often sqrt(N) to 4×sqrt(N); larger nlist → finer partition, more build cost. nprobe: sweep and measure recall@K and latency; increase nprobe until recall meets target.
Can IVFPQ run on GPU?
Yes. Faiss-GPU supports IVFPQ: clustering and PQ distance computation are highly parallel. See GPU-accelerated indexing.