How PQ compresses vectors into codes
In product quantization (PQ), compression is done per subvector. Each subvector is compared to the centroids in that subspace’s codebook; the index of the nearest centroid becomes that subvector’s code. The full vector is stored as the concatenation of these codes—e.g. 8 bytes for 8 subvectors with 256 centroids each (8 bits per code). No raw floats are stored for the vectors; only the codes and the codebooks are kept, which is how PQ achieves large memory savings.
Summary
- Encode: split → argmin to codebook per subvector → output indices. Decode: replace code with centroid, concatenate. Search: sum per-subspace distances (query subvector to DB code’s centroid) via precomputed tables—no decode.
- Finer codebook (larger k) → better approximation, more bits. Tune m and k for compression vs. accuracy.
- Pipeline: train codebooks → encode DB vectors as m codes → at query, build m×k distance table, sum over codes. No decode for search.
- Trade-off: more bits (larger k or m) improves accuracy and memory; product quantization (PQ) explained and the concept of codebooks in PQ give full context.
Encode and decode
Encoding is one pass per vector: split into subvectors, for each subvector find argmin distance to the codebook centroids, output the centroid index. Decoding (reconstructing an approximate vector) is done by replacing each code with the corresponding centroid from the codebook and concatenating. For search we usually don’t decode; we approximate distances by summing, per subvector, the distance from the query subvector to the centroid pointed to by the database code. Those per-subspace distances can be precomputed for the query and stored in a small table, then each database vector’s distance is just a sum of table lookups—very fast.
Accuracy vs. compression
The loss in accuracy comes from replacing each subvector with the nearest centroid; the finer the codebook (more centroids per subspace), the better the approximation but the more bits per code and the larger the codebooks.
Practical tip: start with m=8, k=256; measure recall and compression. Combining IVF and PQ (IVFPQ) uses these codes inside each cluster; optimized product quantization (OPQ) and residual quantization (RQ) improve accuracy at similar or slightly higher cost.
Frequently Asked Questions
How many bits per vector in PQ?
m × log2(k). For m=8, k=256: 8×8 = 64 bits = 8 bytes. Compare to 128-d float32: 512 bytes—64× compression.
Why not decode for search?
Decoding would expand codes back to vectors and then compute distance—slower and uses more memory. Lookup-table distance is O(m) adds per vector, no decode.
Are codes stored per vector or per cluster?
Per vector. In IVFPQ, each vector in the inverted list is stored as its PQ codes; codebooks are shared across all vectors.
Can I use PQ without IVF?
Yes. Flat PQ stores only PQ codes; query scans all vectors. Still much faster than float scan due to lookup-table distance and less memory. See IVFPQ for the combined design.