Updating a node in an HNSW graph
Updating a node in an HNSW graph usually means changing the vector (and optionally metadata) associated with an existing vertex. The graph topology—layers and edges—was built for the old vector, so a true "in-place" update that keeps the same node ID can leave the graph suboptimal for the new vector.
Summary
- Common approach: delete + insert (see deleting a node / orphan problem) with same logical ID. Or overwrite vector in place; neighbors unchanged—can hurt recall if new vector is far from old.
- For atomic updates, systems may hide old and expose new in one step, with background compaction.
- Pipeline: update request → either in-place overwrite (vector only) or delete + insert (full graph consistency); metadata-only updates need no graph change.
- Trade-off: in-place is fast and simple but can degrade recall; delete+insert preserves quality at the cost of delete handling and extra insert work.
Delete-plus-insert vs. in-place
Many implementations treat an update as a delete plus insert: mark or remove the old node, then insert a new node with the new vector and the same logical ID or external ID. That preserves correctness and keeps the graph consistent, but duplicates work and can leave short-term orphans until the new node is linked. Alternatively, the vector stored at the node is overwritten in place; the neighbors and layer assignment are unchanged. Queries then may follow edges that were chosen for the old vector, which can hurt recall if the new vector is far from the old one.
Delete+insert is safer for recall when the vector change is large; in-place is faster and simpler but best when updates are small or rare.
Metadata-only updates
If only metadata (payload) changes and the vector is unchanged, no graph change is needed. The node’s position in the graph and its neighbors remain valid; only the payload store is updated. This is cheap and avoids any impact on recall or graph structure.
Atomic behavior
For atomic updates in a vector database, the system may hide the old node and expose the new one in a single logical step, with compaction or background repair to clean up. The choice depends on how often updates occur and how critical recall and consistency are. Many systems offer an “upsert” or “update” API that either does in-place overwrite or delete+insert under the hood; check product docs for semantics.
Trade-offs and practical tips
Trade-off: in-place update is fast and avoids delete/insert cost but can leave the graph suboptimal; delete+insert preserves quality at the cost of handling orphans and extra insert work. Practical tip: for frequent small vector changes, in-place may be acceptable; for large changes or strict recall requirements, prefer delete+insert or periodic full rebuild.
Frequently Asked Questions
Should I update in place or delete+insert?
Delete+insert is safer for recall; in-place is faster and simpler but can degrade quality if the vector changes a lot.
Can I update only metadata?
Yes. If the vector is unchanged, no graph change is needed; only the payload/metadata store is updated.
Do neighbors get updated when I change a vector?
Not in standard in-place update. Neighbors still point to the node; only the stored vector value changes. Delete+insert rebuilds links for the new vector.
How do vector DBs expose updates?
Usually an “upsert” or “update” API that either does in-place overwrite or delete+insert under the hood; check the product docs for semantics.