← All topics

Database Internals & Storage · Topic 114

How VDBs handle “Deletes” (Soft vs. Hard deletes)

A soft delete marks a point or segment as deleted (e.g. via a tombstone or bitmap) so it is excluded from search and results, but the underlying vector and index structures are not immediately rewritten. A hard delete removes the data and updates the index so the space is reclaimed. Vector databases often use soft deletes for speed and simplicity, with hard deletes deferred to compaction or garbage collection. This topic covers why VDBs prefer soft deletes, pipeline, and trade-offs.

Summary

  • Soft delete: mark point/segment as deleted (tombstone, bitmap, or mask); exclude from search and results; index not rewritten. Hard delete: remove data and reclaim space; often done at compaction or GC.
  • Graph indexes like HNSW don’t support efficient in-place node removal without orphans; so VDBs record deletes in a structure and filter deleted IDs at query time.
  • Soft deletes keep atomicity and WAL replay simple; hard deletes reduce storage and can improve long-term query performance. Important for compliance (e.g. right-to-delete) and capacity planning.
  • Pipeline: delete request → append to WAL → mark ID in delete set/bitmap → filter at query time; compaction/GC later drops data and reclaims space.
  • Practical tip: for GDPR-style right-to-delete, check whether your VDB supports purge/force or relies on compaction; tune compaction so deleted data is reclaimed within your policy window.

Why VDBs prefer soft deletes first

Graph indexes like HNSW don’t support efficient in-place removal of a node without leaving orphans and hurting graph connectivity. So the typical approach is: record the delete in a structure (e.g. delete list, bitmap, or segment-level mask) and filter out deleted IDs during search and when returning results. Later, a merge or rebuild compacts segments and drops deleted vectors (hard delete).

Pipeline: client issues delete → engine appends delete record to WAL → updates delete set or bitmap with the point ID → at query time, any result whose ID is in the delete set is excluded; when compaction runs, segments containing only or mostly deleted points are merged and space is reclaimed.

Soft vs. hard: trade-offs

Soft deletes keep atomicity and WAL replay simple; hard deletes reduce storage and can improve long-term query performance. Understanding your VDB’s delete semantics is important for compliance (e.g. right-to-delete) and capacity planning.

Trade-off: fast, simple deletes and no index rewrite vs. storage not reclaimed until compaction and possible filter overhead if the delete set is very large. Practical tip: monitor delete set size and compaction lag; if compliance requires immediate reclaim, look for purge/force or rebuild options.

Frequently Asked Questions

When does a soft-deleted vector get hard-deleted?

During compaction or garbage collection: when segments are merged or the index is rebuilt, deleted points are dropped and space is reclaimed. Timing depends on VDB configuration and compaction policy.

Do deleted vectors still appear in search?

No. The engine filters out soft-deleted IDs during traversal and when building the result set. From the user’s perspective they are gone; the underlying storage is just deferred for cleanup.

What if I need immediate hard delete for compliance?

Some VDBs support “force” or “purge” options that trigger a more aggressive path (e.g. rebuild affected segment). That can be expensive. Check your vendor’s docs for right-to-delete or GDPR-style guarantees and whether they rely on soft delete + scheduled compaction or support immediate reclaim.

Does soft delete affect recall or latency?

Filtering deleted IDs adds a small cost; if the delete list or bitmap is large, it can add overhead. Compaction reduces the number of deleted entries over time. For most workloads the impact is small compared to the benefit of fast deletes and simple WAL replay.