← All topics

Indexing Algorithms - IVF & Quantization · Topic 88

Product Quantization (PQ) explained

Product quantization (PQ) compresses vectors by splitting each vector into several subvectors, then replacing each subvector with the index of its nearest centroid in a small, learned codebook. The full vector is represented by a short list of codes (one per subvector). At query time, distances are approximated using the same codebooks, often via precomputed distance tables, which makes PQ very fast and reduces memory compared to storing full-precision vectors.

Summary

  • D split into m subvectors; each has codebook of k centroids (e.g. 256); stored as m codes (e.g. 8 bits each). Distance = sum of subvector-to-centroid distances via lookup tables. See how PQ compresses into codes.
  • Often combined with IVF (IVFPQ); variants: OPQ, residual quantization.
  • Pipeline: train codebooks per subspace → encode each vector as m codes → at query, build distance tables and sum over codes. No decode needed for distance.
  • Trade-off: high compression and fast lookup-based distance vs. approximate; OPQ and RQ improve accuracy. Trade-off between compression ratio and accuracy and how quantization reduces memory footprint apply.

Mechanics

PQ divides the D-dimensional vector into m subvectors of dimension D/m. For each subspace, a small number k of centroids (e.g. 256) is learned—that’s the codebook for that subspace. Each subvector is then encoded as the index (code) of its nearest centroid (e.g. 8 bits per subvector). So a 128-D float vector might become 8 subvectors × 8 bits = 64 bits of codes, plus the codebooks. Distance between a query and a database vector is approximated by summing the distances between the query’s subvectors and the database vector’s code centroids, which can be done with lookup tables and is very efficient.

IVF + PQ

PQ is often combined with IVF (IVFPQ): IVF narrows the search to a few clusters, and PQ compresses the vectors stored in those clusters so that scanning them is fast and memory-efficient.

Practical tip: choose m and k (e.g. m=8, k=256) to balance compression and accuracy; how PQ compresses vectors into codes and the concept of codebooks in PQ give details. Optimized product quantization (OPQ) rotates the space before PQ for better recall; residual quantization (RQ) refines with multiple stages.

Frequently Asked Questions

How is PQ distance computed at query time?

For each subvector of the query, compute distance to all k centroids of that subspace; store in a table. For each database vector, sum the table entries for its m codes. No need to decode vectors.

What are typical m and k?

m = 8–32 subvectors; k = 256 (8 bits per code) is common. 128-d with m=8 → 8×8 = 64 bits per vector vs. 128×32 = 4096 bits float.

Does PQ work with cosine?

PQ approximates L2 well. For cosine, vectors are often normalized and PQ is applied; then distance tables are based on L2 on normalized subvectors (related to cosine).

When is PQ better than scalar quantization?

PQ gives higher compression (e.g. 8 bytes for 128-d) and very fast distance via lookup; SQ is simpler and per-dimension. PQ often better for very large scale and when memory is critical.