Database Internals & Storage · Topic 106

In-memory vs. On-disk Vector Databases

In-memory vector databases keep the ANN index (and often vectors) in RAM, giving the lowest latency and highest throughput for search—but capacity is limited by memory and cost. On-disk (or hybrid) systems store the index and vectors on SSD/disk, using layouts like Disk-ANN or mmap so that queries don’t require loading everything into RAM.

Summary

In-memory: the ANN index and vectors live in RAM; lowest latency and highest throughput, but capacity and cost are limited by memory.
On-disk: index and vectors on SSD/disk using layouts like Disk-ANN or mmap; scales to billions and lowers cost; latency is higher and more variable due to I/O.
Hybrid / tiered: hot data in RAM, cold on disk or tiered storage (e.g. S3); single query path may touch both. Persistence via WAL and compaction; index loading from disk matters for in-memory restart.
Choose in-memory when the dataset fits in RAM and you need single-digit ms p99; choose on-disk or hybrid when you need scale or lower cost per GB.
Practical tip: estimate index size vs. RAM; set a latency target; use WAL and snapshots for durability even when the live index is in-memory.

When to choose in-memory

In-memory is ideal when the dataset fits in RAM and you need single-digit millisecond p99. Hot collections are often kept in memory while cold ones live on disk. Persistence still matters: in-memory DBs use write-ahead logs and snapshots so that data can be restored after restart.

Pipeline: all vector and index data in RAM; queries hit memory only; writes go to WAL then index. Trade-off: lowest latency and highest QPS vs. limited capacity and higher cost per GB. Practical tip: if your index fits in RAM and you need single-digit ms p99, in-memory (with WAL and snapshots) is the simplest.

When to choose on-disk or hybrid

On-disk scales to billions of vectors and reduces cost per GB; latency is higher and more variable due to I/O. The choice affects durability guarantees and operational cost. Hybrid designs (e.g. index on SSD with mmap, or tiered storage) balance capacity and latency.

On-disk pipeline: index and vectors on SSD; query loads only needed pages (e.g. graph nodes, IVF lists); cache hot data in RAM. Hybrid: hot segments in RAM, cold on disk or S3; query may merge results from both. Trade-off: scale and cost vs. latency and implementation complexity.

Persistence and recovery

Even in-memory systems need durability: WAL and snapshots are written to disk so that after a crash or restart, the index can be restored by replaying the log and loading from disk. On-disk designs still use RAM for caching (e.g. graph nodes, codebooks); the full index does not have to fit in memory.

Frequently Asked Questions

Can an in-memory VDB be durable?

Yes. WAL and snapshots are written to disk; after restart you replay the log and restore. Durability is about “committed data survives crash,” not where the live index lives.

What is a hybrid setup?

Often: hot data or recent segments in RAM, older segments or cold collections on SSD/disk (or S3), with a single query path that may touch both.

Does on-disk mean no RAM usage?

No. On-disk indexes still use RAM for caching (e.g. graph nodes, codebooks). The difference is that the full index doesn’t have to fit in memory.

How do I decide for my workload?

Estimate index size and compare to available RAM; set a latency target. If you fit in RAM and need low latency, in-memory (with persistence) is simpler. If you don’t fit or want lower cost, use on-disk or tiered.