Indexing Algorithms - IVF & Quantization · Topic 87

Scalar Quantization (SQ): Moving from Float32 to INT8

Scalar quantization (SQ) converts each vector component from a full-precision type (e.g. Float32) to a smaller integer type (e.g. INT8 or UINT8). Each dimension is quantized independently, usually by mapping the range of values in that dimension to a fixed number of levels. The result is a large reduction in memory footprint and often faster distance computation, with a controllable trade-off between compression and accuracy.

Summary

Per-dimension min-max or range mapping to e.g. [0, 255]; query quantized same way; distances on integers. Simpler than PQ (no codebooks). Often combined with IVF. Between raw float and binary for accuracy vs. speed.
Pipeline: compute per-dimension (or per-vector) scale/offset at build; quantize stored vectors and query to INT8; compute distance on integers. Store scale/offset for dequantization if needed.
Trade-off: 4× memory reduction vs. float32; small accuracy loss; integer ops often faster. Handling negative values (symmetric or offset) required for signed embeddings.
How quantization reduces memory footprint and trade-off between compression ratio and accuracy apply; combining IVF and PQ (IVFPQ) uses PQ inside cells—SQ is an alternative there.

How SQ works

A common approach is per-dimension min-max or range-based quantization: compute min and max (or robust bounds) per dimension, then map the float value to an integer in [0, 255] for 8-bit. At search time, the query can be quantized the same way, and distances are computed on integers (e.g. L2 on INT8), which is faster and more cache-friendly than float. Some systems use asymmetric quantization (different scale/offset per dimension) or train the codebook per subspace. The main downside is that very low bit widths (e.g. 4-bit) can hurt recall; 8-bit scalar often gives a good balance and is widely supported (e.g. in Faiss).

Compared to PQ

SQ is simpler than product quantization (PQ): no codebooks or subvectors, just a scale and offset per dimension (or per vector). It’s often combined with IVF (IVF + SQ) for both reduced search space and smaller vectors in memory.

Practical tip: use symmetric INT8 for signed embeddings (e.g. [-128, 127]); for unsigned or non-negative, map [min, max] to [0, 255]. How quantization reduces memory footprint and accuracy vs. speed in quantization summarize the trade-offs; computation overhead of different distance metrics changes when using integer distance.

Frequently Asked Questions

How much memory does INT8 save vs. Float32?

4×: 1 byte per dimension vs. 4. For 768-d, ~3 KB vs. ~12 KB per vector.

Do I need to store scale/offset?

Yes, per dimension (or per vector) to dequantize or to compute distances. Overhead is small (e.g. 768 × 8 bytes for 768-d).

Can SQ handle negative values?

Yes. Use symmetric INT8 (e.g. [-128, 127]) or offset: map [min, max] to [0, 255] and store min/max. See handling negative values.

Is SQ faster than float for distance?

Often yes: integer ops and better cache utilization. SIMD can do more INT8 ops per cycle than float in many CPUs.