High-level architecture of a Vector Database
A vector database typically has three main layers: ingestion (accepting vectors and optional metadata), indexing (building and maintaining structures like HNSW or IVF for fast ANN search), and query (taking a query vector and returning nearest neighbors). Optionally, it stores raw payloads and supports filtering by metadata before or after the vector search.
Summary
- Ingestion: accept points (vector + ID + optional metadata), batching, assign to collection.
- Indexing: build/maintain ANN index (e.g. HNSW, IVF) so queries don’t require full scan.
- Query: take query vector, traverse index, return nearest points (with optional filters and payload).
- Storage can be in-memory or on-disk; many systems separate vector index from metadata/payload; a full DB adds persistence, scaling, and query lifecycle vs. a vector library.
Ingestion path
The ingestion path accepts vectors (and optional IDs and metadata) and assigns them to a collection or index. It often supports batching for throughput and may write to a write-ahead log (WAL) before updating the in-memory or on-disk index. Each inserted item is a point: one vector plus identity and optional payload. Ingestion may trigger incremental index updates (e.g. adding a node to an HNSW graph) or periodic compaction.
Throughput on the ingestion path is often limited by index update cost rather than raw write I/O. Batching many points per request and async index building can improve ingest rate. Some systems support bulk load (offline index build) for initial population and then switch to incremental for updates.
Indexing layer
The index is built so that queries don’t require a full scan. Instead, the system traverses a graph (e.g. HNSW) or probes clusters (e.g. IVF) to find approximate nearest neighbors quickly. Index parameters (e.g. M, efConstruction for HNSW; nlist, nprobe for IVF) affect build time, memory, and recall vs. latency. Storage can be in-memory, on disk, or distributed; many systems separate the vector index from metadata and payload storage (e.g. payloads in a columnar or document store).
The indexing layer is where vector databases differ most from generic key-value or document stores. Dedicated ANN structures are optimized for high-dimensional similarity; swapping in a different index type (e.g. switching from HNSW to IVF) changes the recall–latency trade-off and operational characteristics.
Query path
The query path takes a query vector and parameters (k, optional filters) and returns the k nearest points (IDs, scores, payloads). The engine may apply pre- or post-filtering for metadata. Understanding this architecture helps when comparing a vector DB to a relational DB or a vector library like Faiss: the DB adds persistence, scaling, filtering, and query lifecycle features on top of the core index.
Query latency is dominated by index traversal (graph hops or cluster probes) and distance computations. Tuning parameters like efSearch (HNSW) or nprobe (IVF) lets you trade recall for speed. Many systems also support returning stored payloads with each result so the application doesn’t need a separate lookup.
Distributed and cloud-native variants
In distributed deployments, sharding splits vectors across nodes; a coordinator fans out the query and merges results. Cloud-native designs may separate compute and storage for elasticity. See replication, load balancing, and Kubernetes for production architecture.
Frequently Asked Questions
What is the difference between a vector database and a vector library (e.g. Faiss)?
A vector library provides index and search in process; a vector DB adds persistence, multi-tenancy, filtering, query API, and often distributed deployment. Use a library for embedded or single-app use; use a DB for shared, durable, scalable search.
Where does embedding happen in the architecture?
Usually in the application layer: your service calls an embedding API or model, then sends the resulting vectors to the VDB. Some systems support “query by text” and run the embedding inside the DB. Either way, the VDB stores and indexes vectors; it doesn’t typically train or host the embedding model.
Do all vector DBs have the same architecture?
No. Some are in-memory only; some are disk-first; some separate compute and storage. Index types (HNSW, IVF, etc.) and support for filtering, hybrid search, and ACID vary. The three layers (ingest, index, query) are a common mental model.
How does the index get updated on new inserts?
Depends on the index. HNSW supports incremental inserts (new nodes linked into the graph). IVF may require periodic rebuild or delta segments. See real-time vs. offline indexing and atomic updates.