Indexing Algorithms - HNSW · Topic 75

HNSW on disk vs. HNSW in RAM

HNSW in RAM keeps the graph and (typically) full vectors in memory for minimal latency and maximum throughput. HNSW on disk stores the index and often vectors on SSD or other storage to reduce cost and scale beyond RAM, at the cost of higher latency and more complex access patterns.

Summary

In-memory: random access, no disk I/O; latency dominated by CPU and efSearch; memory grows with n and M.
On-disk (e.g. Disk-ANN, Vamana): bounded reads, caching; higher p99; consider loading and latency requirements.
Pipeline: choose in-memory when index fits RAM and latency SLA is strict; use disk-backed or hybrid when scale or cost demands it.
Trade-off: RAM gives lowest latency and simplest ops; disk reduces cost and supports billion-scale with bounded I/O per query.
Overcoming RAM limitations for billion-scale vectors and tiered storage describe scaling and cost options beyond single-node RAM.

In-memory HNSW

In-memory HNSW benefits from random access to any node and neighbor list with no disk I/O. Query latency is dominated by CPU and efSearch; memory consumption grows with dataset size and M. This is the default for many vector DBs when the dataset fits in RAM.

Practical tip: estimate memory (vectors + graph) before scaling; use memory usage per million vectors as a reference. If the index exceeds available RAM, plan for quantization, disk-backed layout, or sharding. Warm-up time for in-memory indexes applies when loading from disk at startup.

On-disk designs

On-disk designs (e.g. Disk-ANN, Vamana) layout the graph and vectors so that a query triggers a bounded number of sequential or near-sequential reads, avoiding random seeks. Caching hot portions in RAM helps. The trade-off is higher p99 latency and more engineering for persistence, loading, and updates. Choosing between RAM and disk depends on scale, budget, and latency requirements.

Disk-ANN and Vamana are optimized for SSD access patterns; index persistence and loading indexes from disk matter for startup and recovery. Memory-mapped files (mmap) can expose an on-disk index to the OS page cache for a hybrid of disk capacity and opportunistic RAM caching.

Pipeline and when to choose

Use in-memory HNSW when the full index fits in RAM and you need the lowest latency. Use disk-backed (or mmap) when the index is too large for RAM or when cost favors smaller instances with disk. Tiered storage (moving old vectors to S3) and overcoming RAM limitations for billion-scale vectors describe further scaling options.

Frequently Asked Questions

Can I mmap the HNSW index?

Yes. Memory-mapping the index file lets the OS page in regions on demand. Latency can be more variable than full RAM; see memory-mapped files.

How much slower is disk HNSW?

Depends on layout and cache. Typically p99 can be several ms to tens of ms vs. sub-ms for in-memory; p50 with good caching can be close.

When should I use disk-backed HNSW?

When the index doesn’t fit in RAM or you want to reduce cost; when p99 latency in the 10–50 ms range is acceptable.

Does Vamana differ from HNSW?

Vamana is a disk-oriented graph index with a different build and layout; Vamana covers it in detail. Conceptually similar (graph + greedy search) but optimized for SSD access.