Performance, Evaluation & Benchmarking · Topic 171

Memory usage per million vectors

Memory per million vectors is a standard way to compare how much RAM different vector indexes and storage layouts use at scale. It helps you size hardware and choose between in-memory, on-disk, or hybrid designs.

Summary

Memory per million vectors is a standard way to compare RAM use of different vector indexes and storage layouts at scale; helps size hardware and choose in-memory vs. on-disk or hybrid.
Raw vectors dominate (e.g. 1M × 768 float32 ≈ 3 GB). Index overhead: HNSW often several times raw size; IVF often lower. Scalar or product quantization can cut memory 2–4× with some accuracy loss. When reporting, include dimension, dtype, index type, and whether metadata is counted. See billion-scale. Pipeline: measure RSS or process memory after load. Practical tip: report MB per million vectors with dimension and dtype for fair comparison.

What consumes memory

Raw vectors dominate: for 1M vectors of dimension 768 in float32, the vectors alone need 768 × 4 × 1M ≈ 3 GB. Index structures add overhead: HNSW stores graph links and typically uses several times the raw vector size; IVF plus centroids and inverted lists is often lower. Scalar or product quantization can cut memory by 2–4× with some accuracy loss.

Trade-off: more aggressive quantization saves memory but can hurt recall. Pipeline: measure RSS or process memory after load. Practical tip: report MB per million vectors with dimension and dtype for fair comparison.

Reporting the metric

When reporting “memory per million vectors,” include: vector dimension, dtype (float32/int8/etc.), index type, and whether metadata or payloads are counted. This metric is essential for billion-scale planning and for comparing in-memory vs. on-disk trade-offs.

Frequently Asked Questions

Why measure memory per million vectors?

Standard way to compare how much RAM different vector indexes and storage layouts use at scale. Helps size hardware and choose between in-memory, on-disk, or hybrid designs. Essential for billion-scale planning.

What dominates memory usage?

Raw vectors (e.g. 1M × 768 float32 ≈ 3 GB). Index structures add overhead: HNSW typically several times raw size; IVF often lower. Scalar or product quantization can cut memory 2–4× with some accuracy loss.

What should I include when reporting this metric?

Vector dimension, dtype (float32/int8/etc.), index type, and whether metadata or payloads are counted. Enables fair comparison across indexes and storage designs.

How does quantization affect memory?

Scalar quantization (float32→int8) and product quantization reduce memory by 2–4× with some recall loss. See accuracy vs. speed in quantization and compression ratio vs. accuracy.