In-memory vs. On-disk Vector Databases
In-memory vector databases keep the ANN index (and often vectors) in RAM, giving the lowest latency and highest throughput for search—but capacity is limited by memory and cost. On-disk (or hybrid) systems store the index and vectors on SSD/disk, using layouts like Disk-ANN or mmap so that queries don’t require loading everything into RAM.
Summary
- In-memory: the ANN index and vectors live in RAM; lowest latency and highest throughput, but capacity and cost are limited by memory.
- On-disk: index and vectors on SSD/disk using layouts like Disk-ANN or mmap; scales to billions and lowers cost; latency is higher and more variable due to I/O.
- Hybrid / tiered: hot data in RAM, cold on disk or tiered storage (e.g. S3); single query path may touch both. Persistence via WAL and compaction; index loading from disk matters for in-memory restart.
- Choose in-memory when the dataset fits in RAM and you need single-digit ms p99; choose on-disk or hybrid when you need scale or lower cost per GB.
- Practical tip: estimate index size vs. RAM; set a latency target; use WAL and snapshots for durability even when the live index is in-memory.
When to choose in-memory
In-memory is ideal when the dataset fits in RAM and you need single-digit millisecond p99. Hot collections are often kept in memory while cold ones live on disk. Persistence still matters: in-memory DBs use write-ahead logs and snapshots so that data can be restored after restart.
Pipeline: all vector and index data in RAM; queries hit memory only; writes go to WAL then index. Trade-off: lowest latency and highest QPS vs. limited capacity and higher cost per GB. Practical tip: if your index fits in RAM and you need single-digit ms p99, in-memory (with WAL and snapshots) is the simplest.
When to choose on-disk or hybrid
On-disk scales to billions of vectors and reduces cost per GB; latency is higher and more variable due to I/O. The choice affects durability guarantees and operational cost. Hybrid designs (e.g. index on SSD with mmap, or tiered storage) balance capacity and latency.
On-disk pipeline: index and vectors on SSD; query loads only needed pages (e.g. graph nodes, IVF lists); cache hot data in RAM. Hybrid: hot segments in RAM, cold on disk or S3; query may merge results from both. Trade-off: scale and cost vs. latency and implementation complexity.
Persistence and recovery
Even in-memory systems need durability: WAL and snapshots are written to disk so that after a crash or restart, the index can be restored by replaying the log and loading from disk. On-disk designs still use RAM for caching (e.g. graph nodes, codebooks); the full index does not have to fit in memory.
Frequently Asked Questions
Can an in-memory VDB be durable?
Yes. WAL and snapshots are written to disk; after restart you replay the log and restore. Durability is about “committed data survives crash,” not where the live index lives.
What is a hybrid setup?
Often: hot data or recent segments in RAM, older segments or cold collections on SSD/disk (or S3), with a single query path that may touch both.
Does on-disk mean no RAM usage?
No. On-disk indexes still use RAM for caching (e.g. graph nodes, codebooks). The difference is that the full index doesn’t have to fit in memory.
How do I decide for my workload?
Estimate index size and compare to available RAM; set a latency target. If you fit in RAM and need low latency, in-memory (with persistence) is simpler. If you don’t fit or want lower cost, use on-disk or tiered.