Database Internals & Storage · Topic 118

Index persistence: Loading indexes from disk

Index persistence means writing the in-memory index (e.g. HNSW graph, IVF centroids and lists) to disk so that after a restart the vector database can load it instead of rebuilding from scratch. Loading from disk avoids long warm-up and restores search capability quickly. This topic covers how persistence works, load pipeline, and trade-offs.

Summary

Index (e.g. HNSW, IVF) is written to disk; on restart the VDB loads it instead of rebuilding—avoids long warm-up and restores search quickly.
Serialized format mirrors in-memory layout: graph edges, vector IDs, optionally raw vectors or PQ codes. Startup: read files (often mmap for vectors), reconstruct index, optionally replay WAL for updates after checkpoint.
Format may be versioned; upgrades can require migration or rebuild. Load time depends on size and I/O; very large indexes may take minutes. Makes in-memory or hybrid VDBs durable and restart-friendly.
Trade-off: fast recovery and no rebuild vs. format versioning and load time for very large indexes.
Practical tip: schedule checkpoints so WAL replay after load is bounded; use mmap when possible to become queryable before full load.

How index persistence works

The index is typically serialized in a format that mirrors its in-memory layout: graph edges, vector IDs, and optionally raw vectors (or codes for PQ). On startup, the VDB reads these files (and often uses mmap for the vector data), reconstructs the graph or lists, and optionally replays the WAL to apply updates that happened after the last checkpoint. That gives a consistent, queryable index without a full rebuild.

Pipeline: checkpoint/snapshot → serialize index and vector files to disk; on restart → read index files, mmap or load vectors, reconstruct in-memory structures → replay WAL from checkpoint to present → serve. With mmap, the process can accept queries while the OS pages in the rest of the data.

Trade-offs and operational notes

Trade-offs: persisted index format may be versioned; upgrades can require migration or rebuild. Loading time depends on index size and I/O; very large indexes may still take minutes. Persistence is what makes in-memory or hybrid VDBs durable and restart-friendly while keeping query path in RAM.

Practical tip: align checkpoint frequency with WAL size so that replay after load is short; for very large indexes, expect load time in the minutes and plan for rolling restarts or standby replicas if you need high availability.

Frequently Asked Questions

When is the index written to disk?

On checkpoint or snapshot: after a flush of the WAL (or periodically), the engine serializes the current index state to disk. Some systems do this on graceful shutdown or on a schedule; see your vendor’s persistence and snapshot docs.

Do I need to replay WAL after loading?

If there were writes after the last checkpoint, yes. The persisted index reflects the state at checkpoint time; replaying the WAL applies those later updates so the loaded index is up to date. Without replay, you’d be at the checkpoint state only.

How long does loading take?

Depends on index size and I/O. With mmap, the OS can page in data on demand so the process may become queryable quickly while the rest loads in the background. Full load of a very large index can take minutes. Compare to warm-up time when rebuilding from scratch.

Can I load an index from a backup?

Yes, if the backup includes the persisted index files (and any vector files). Restore those to the expected paths and start the VDB; it will load from disk. You may need to replay WAL from the backup if you want point-in-time recovery. See snapshotting and backup.