Indexing Algorithms - IVF & Quantization · Topic 95

Residual Quantization (RQ)

Residual quantization (RQ) works in stages: first quantize the vector (e.g. to a codebook centroid), then quantize the residual (the error vector). Repeating this yields a multi-stage code that can achieve better compression vs. accuracy than a single-stage PQ. RQ is used in some high-recall ANN indexes where memory is constrained but distortion must stay low.

Summary

Stage 1: quantize x → c₁; residual r₁ = x − c₁. Stage 2: quantize r₁ → c₂; residual r₂ = r₁ − c₂; repeat. Stored as sequence of codes. Better reconstruction than single-stage PQ; more codebooks and decode cost. Compare to OPQ.
Pipeline: train codebooks per stage → encode by successive residual quantization → decode by summing centroids. Distance computed across stages (asymmetric or multi-table).
Trade-off: higher accuracy than single-stage PQ at similar bit rate; more complex build and distance. Product quantization (PQ) explained and how PQ compresses vectors into codes describe single-stage base.

Algorithm

At each stage, you have a residual vector (initially the original vector). You quantize it to the nearest centroid in that stage’s codebook, append the centroid index to the code, and set the new residual to the difference (residual minus centroid). The next stage quantizes this new residual with its own codebook. Decoding sums the centroids from all stages; the code is the concatenation of stage indices. Multiple stages allow the representation to refine the approximation, reducing distortion compared to a single codebook at the same total bit rate in many cases.

Trade-offs

RQ increases the number of codebooks and the complexity of distance computation (you need to combine distances across stages or use asymmetric distance). It is used in research and in some production systems that need very high recall with limited memory; plain PQ or OPQ is more common when simplicity and speed are priorities.

Practical tip: consider RQ when PQ/OPQ recall is not enough and you can afford multi-stage training and distance. Trade-off between compression ratio and accuracy and how quantization reduces memory footprint apply; optimized product quantization (OPQ) is a simpler alternative that often suffices.

Frequently Asked Questions

How many stages in RQ?

Typically 2–4. More stages improve reconstruction but add codebooks and compute; diminishing returns after a few stages.

Is RQ the same as PQ with more subvectors?

No. PQ splits the vector into subvectors and quantizes each independently. RQ quantizes the full vector then the full residual; structure is sequential, not parallel.

How is RQ distance computed at query time?

Often by precomputing, per stage, the distance from the query to each centroid; then for each DB vector, sum the stage-wise distances for its codes. More expensive than single-stage PQ.

When to use RQ vs. OPQ?

OPQ improves PQ by rotation; RQ improves by multi-stage refinement. OPQ is simpler and widely supported; RQ when you need maximum accuracy at limited bits and can afford the cost.