Indexing Algorithms - IVF & Quantization · Topic 94

Binary Quantization (BQ) and its speed advantages

Binary quantization (BQ) maps each dimension of a vector to a single bit (e.g. by thresholding at zero after normalization). Vectors become compact bitstrings, and similarity can be computed with Hamming distance or popcount on XOR—extremely fast on modern CPUs (SIMD) and with minimal memory. The trade-off is lower fidelity than scalar or product quantization, but BQ is attractive when speed and memory footprint matter more than accuracy.

Summary

Binary quantization (BQ) maps each dimension to one bit (e.g. sign after normalization); a 768-d float vector becomes 96 bytes. Distance is Hamming (XOR + popcount)—extremely fast and SIMD-friendly.
Sign-based (threshold at 0) or learned binary codes; use for first-stage retrieval or with graph/IVF indexes. Trade-off: highest compression and speed, lower fidelity than SQ or PQ—see accuracy vs. speed in quantization.
Best when speed and memory footprint matter more than accuracy: filtering large candidate sets, resource-constrained environments, or re-ranking after BQ retrieval.
For normalized 0/1 vectors, Hamming distance equals squared L2; BQ is an approximation for cosine. Combine with full-precision or higher-fidelity re-ranking for best quality.
Pipeline: binarize at ingest (e.g. sign per dimension); store bits; at query binarize query and compute Hamming (XOR + popcount) over candidate set. Use as first stage then re-rank with full vectors or SQ/PQ.

Encoding and distance

The simplest BQ is sign binarization: after normalization (or centering), each dimension is 1 if positive and 0 if negative. So a 768-d float vector becomes 768 bits = 96 bytes. Distance is Hamming: XOR the two bitstrings and count set bits (popcount). Modern CPUs have fast popcount instructions and SIMD can process many dimensions at once, so billions of Hamming comparisons per second are possible.

When to use BQ

BQ gives the highest compression (32× vs. float32) and the fastest distance (no multiplies, just XOR + popcount), but loses the most information. It is used for first-stage filtering (get a large candidate set quickly, then re-rank with full-precision or higher-fidelity codes), for very resource-constrained environments, and in learned binary hashing schemes where the binarization is learned to preserve similarity.

Trade-offs and combining with other methods

BQ trades accuracy for speed: coarser than scalar or product quantization. In practice, use BQ to quickly narrow the candidate set, then apply re-ranking with full-precision vectors or a cross-encoder. BQ can also be combined with HNSW or IVF for hybrid pipelines. Measure recall@k and latency to confirm it meets your SLO; see accuracy vs. speed in quantization for the full trade-off picture.

Practical tip: use BQ for a fast first stage (e.g. retrieve top-500 by Hamming) then re-rank with full-precision or scalar/PQ distance. Hamming distance for binary vectors and trade-off between compression ratio and accuracy apply; how quantization reduces memory footprint covers memory impact.

Frequently Asked Questions

Is Hamming distance the same as L2 on binary vectors?

For 0/1 vectors, Hamming equals the squared L2 distance (each differing bit contributes 1 to both). So nearest neighbor by Hamming is same as by L2 for binary.

Can I use BQ with cosine?

After normalization, sign binarization loses magnitude; Hamming on the resulting bits is a proxy for angle. For true cosine you’d use full or higher-precision; BQ is an approximation.

What are “learned” binary codes?

Models (e.g. neural nets) that map vectors to bits so that Hamming distance approximates semantic similarity. Better than sign binarization when trained on task data.

How fast is Hamming vs. float L2?

Orders of magnitude faster: no floating point, minimal memory bandwidth. Popcount is a single instruction; SIMD can do 256+ bits at a time.