Database Internals & Storage · Topic 112

Memory-mapped files (mmap) for vector storage

Memory-mapped files (mmap) let the operating system map a file on disk into the process’s virtual address space. The program accesses the file as if it were RAM; the OS pages in only the regions that are actually read, which is well-suited to the random-access pattern of nearest neighbor search over vectors in a vector database. This topic covers how mmap helps vector storage, trade-offs, and when to use it.

Summary

mmap maps a file into the process address space; the program reads/writes as if RAM; the OS pages in only accessed regions—good for random-access nearest neighbor over vectors.
VDBs store vector segments or index payloads in files and mmap them instead of loading full files into heap; reduces RAM use. OS page cache keeps hot pages in RAM; HNSW traversal and IVF probe benefit.
Often used for read-only immutable segments; writes go to WAL or mutable buffer. Drawbacks: page faults can make I/O less predictable; writes need copy-on-write or separate writable path.
Common in on-disk and hybrid VDBs to balance memory and latency.
Practical tip: prewarm key pages at startup to reduce cold p99; use mmap for read-only segments and WAL/mutable buffer for writes.

How mmap helps vector storage

VDBs can store vector segments or index payloads in files and mmap them instead of loading the full file into heap memory. That reduces RAM use (you don’t need to hold the entire dataset in memory), and the OS cache and page cache naturally keep hot pages in RAM. Random reads during graph traversal (e.g. HNSW) or IVF probe benefit from this pattern.

Pipeline: at startup or when a segment is sealed, mmap the file; query touches pages as it traverses the index; OS faults in only accessed pages; hot pages stay in page cache. Writes go to WAL or mutable buffer; new segments are written and mmap’d when immutable. This keeps the read path simple and memory footprint bounded.

Trade-offs and when to use mmap

Drawbacks: mmap I/O can be less predictable under load (page faults), and writes usually require a different path (copy-on-write or separate writable files). It’s a common technique in on-disk and hybrid VDBs to balance memory usage and latency.

Trade-off: lower RAM use and no explicit load step vs. less predictable p99 due to page faults. Practical tip: for strict latency SLOs, prewarm frequently accessed pages or use in-memory index; for large indexes where full load is not possible, mmap with adequate cache is the standard approach.

Frequently Asked Questions

Why not just read the whole file into RAM?

For large indexes (e.g. billions of vectors), loading the full file would exceed RAM or cause long startup. mmap lets you use only the pages you touch; the OS keeps hot pages in the page cache so repeated access is fast without explicit application-level caching.

Can I use mmap for writes?

mmap can be writable, but in-place updates complicate immutable segment design and crash consistency. Most VDBs use mmap for read-only segments and send writes to a WAL or mutable buffer; new segments are then written and mmap’d when sealed.

Why can mmap latency be unpredictable?

First access to a page triggers a page fault and disk I/O; under mixed or cold workload, many faults can occur and p99 latency can spike. Prewarming (touching key pages at startup) or adequate cache size helps; for strict SLOs, in-memory or preloaded indexes may be preferred.

Does mmap work with tiered storage?

Typically mmap is used for local disk (or fast NVMe). For tiered storage, cold data on S3 is usually read via normal I/O or a separate load path, not mmap, because object storage isn’t a local filesystem. Hot tier can still be mmap’d.