Incremental building of HNSW graphs
HNSW is an incremental index: you can add new vectors at any time without rebuilding from scratch. For each new vector, the algorithm assigns it a random maximum layer (using the same level distribution as at build time), then at each layer from top to bottom runs an approximate search to find the nearest neighbors in the current graph and connects the new node to up to M of them. Existing nodes may also get new links pointing to the inserted node (bi-directional links).
Summary
- Insert cost dominated by internal searches per layer (efConstruction) and pruning. For high-throughput ingestion, lower efConstruction or buffer + batch build; see streaming ingestion.
- Insertion order can bias the graph; for best quality, bulk load first then add incrementally. Updates and deletes have their own implications.
- Pipeline: assign max layer → for each layer run ANN search with efConstruction-sized candidate list → select up to M neighbors → create bi-directional edges; existing neighbors may get new links back to the inserted node.
- Concurrent insertion is implementation-dependent; see parallelizing HNSW index construction for build-time parallelism.
- Trade-off: lower efConstruction speeds inserts but can reduce graph quality over time; periodic full rebuild with high efConstruction can restore recall.
How incremental insertion works
When a new vector is inserted, the algorithm first draws a random maximum layer (e.g. with probability 1/M for each additional layer above 0). Then for each layer from that max down to 0, it runs an approximate nearest neighbor search in the current graph—the same greedy search used at query time—with a candidate list of size efConstruction. From the candidates it selects up to M neighbors (often with a pruning strategy) and creates bi-directional edges: the new node links to these neighbors, and each of these neighbors adds a link back to the new node. So the graph grows by one node and up to M×(max_layer+1) new edges (and reciprocal updates on existing nodes).
No global rebalancing is performed; the existing graph is left as is except for the new node and the links to and from it. That is why insertion order can influence quality: early nodes shape the graph, and later inserts adapt to whatever structure exists.
Insert cost and throughput
Insert cost is dominated by the internal searches at each layer (controlled by efConstruction) and by any pruning or link updates. So ingestion throughput (vectors per second) depends on efConstruction and dataset size—larger graphs mean each insertion search is slower. For high-throughput streaming ingestion, teams often use a lower efConstruction for incremental inserts and occasionally run a background rebuild with higher efConstruction, or buffer inserts and do small batch builds.
Roughly, one insert costs O(log n × efConstruction) distance computations. With efConstruction=200 and 1M vectors, one insert can be on the order of milliseconds; with 10M vectors it can be several milliseconds. So thousands of inserts per second typically require lowering efConstruction or batching.
Quality and ordering
Incremental build does not rebalance the whole graph; insertion order can bias the structure. For best quality, bulk loading with a fixed efConstruction (and optionally parallel construction) is often used for the initial load, then new data is added incrementally. Random or shuffled order often gives a more uniform graph than strictly ordered data (e.g. by time or ID); worst case is rare but possible.
Practical tip: when designing for streaming or append-heavy workloads, choose efConstruction so that single-insert latency meets your SLA (e.g. < 10 ms); measure recall@k periodically and trigger a full rebuild if it drops below a threshold. Handling high-frequency updates (streaming ingestion) and measuring index build time help size and monitor the pipeline.
Pipeline: from insert request to graph update
The pipeline for one insert: (1) receive vector and optional ID; (2) draw max layer; (3) for layer = max down to 0: run greedy search with candidate list size efConstruction, select up to M neighbors (with pruning if applicable), create bi-directional edges; (4) persist or update index. Existing nodes that gain a new neighbor may exceed M links; implementations may prune to keep at most M or M0 per layer. So the critical parameters are efConstruction (search quality during insert) and M (max degree).
Trade-offs and practical tips
Trade-off: lower efConstruction speeds inserts and keeps ingestion latency low but can produce a weaker graph and lower recall over time. Higher efConstruction improves graph quality but slows each insert. Practical tip: for mixed workload (bulk + incremental), do initial bulk load with high efConstruction, then use lower efConstruction for incremental inserts; if recall drifts down, schedule a periodic full rebuild. For updates and deletes, updating a node in an HNSW graph and deleting a node (orphan problem) describe the mechanics and trade-offs.
Frequently Asked Questions
Do I need to rebuild when I add many vectors?
Not required; incremental insert works. If recall drops over time, consider a periodic full rebuild with high efConstruction.
Can multiple threads insert at once?
Depends on the implementation. Some support concurrent insertion with locking; see parallelizing HNSW index construction.
Does insertion order affect recall?
It can. Random or shuffled order often gives a more uniform graph than strictly ordered data; worst case is rare but possible.
How fast is one insert?
Roughly O(log n × efConstruction) distance computations per insert. With efConstruction=200 and 1M vectors, one insert can be on the order of milliseconds.