Filtering & Querying · Topic 129

Bitmaps and Bitsets for filtering

A bitmap (or bitset) is a compact array of bits, one per point: bit i is 1 if point i satisfies the current metadata filter, 0 otherwise. Vector databases use bitmaps to quickly decide “is this point eligible?” during pre-filtering or graph traversal without loading full payloads. This topic covers how bitmaps are built, combined, and used with soft deletes.

Summary

One bit per point: 1 = passes metadata filter, 0 = ineligible. Used during pre-filtering and graph traversal without loading payloads.
Built from indexed metadata; AND/OR conditions become bitwise AND/OR on bitmaps. When visiting a graph node, check the bit for that ID; if 0, skip.
Dense (one bit per vector), so scale to millions with minimal memory. Also used for soft deletes (bit = 0 = deleted), combined with filter bitmap.
Trade-off: fast eligibility check and low memory vs. build cost for complex or high-cardinality filters (paid once per query).
Practical tip: index filter fields so bitmap build is cheap; combine filter bitmap with delete bitmap (AND with inverse of delete) for a single eligibility mask.

How bitmaps are built and used

Bitmaps are built from indexed metadata: for a filter like category = "news", the engine (or a precomputed index) sets bits for all point IDs that have category = "news". Multiple conditions (AND/OR) become bitwise AND/OR on the bitmaps, which is very fast. The result is a single bitmap that the search layer uses: when visiting a node in an HNSW graph, check the bit for that node’s ID; if 0, skip. That’s the core of in-bitmap filtering during traversal.

Pipeline: query arrives with filter → build bitmap from indexed metadata (per-value bitmaps OR/AND as needed) → optionally AND with inverse of delete bitmap → pass bitmap to search; during traversal each candidate ID is checked with a single bit lookup. Trade-off: bitmap build is O(eligible set) or O(1) if precomputed per segment; very high-cardinality filters may need a scan.

Memory and soft deletes

Bitmaps are dense (one bit per vector) so they scale to millions of points with minimal memory. For high-cardinality or complex filters, building the bitmap may require scanning an index or merging several per-value bitmaps; the cost is paid once per query.

Some systems also use bitmaps to represent soft deletes (bit = 0 means deleted), and combine the delete bitmap with the filter bitmap so that deleted points are excluded in the same way as ineligible ones. Practical tip: keep bitmaps word-aligned for SIMD; cache per-value bitmaps when filter values repeat across queries.

Frequently Asked Questions

What is a bitmap used for in a vector DB?

To represent which points satisfy the current metadata filter: one bit per point ID. The search layer uses it during pre-filtering or graph traversal to skip ineligible points without loading full payloads.

How are AND and OR filters combined with bitmaps?

Each condition (e.g. category = "news", price < 100) can produce a bitmap. AND = bitwise AND of those bitmaps; OR = bitwise OR. The result is one bitmap that the vector search uses to decide eligibility.

Why use bitmaps instead of loading metadata for each candidate?

Bitmaps are dense (one bit per vector) and bitwise ops are very fast. Checking one bit per candidate during traversal is cheaper than loading and evaluating full metadata for every node. See in-bitmap filtering during graph traversal.

Can bitmaps represent deleted vectors?

Yes. Many systems use a “delete” bitmap where bit = 0 means the point is soft-deleted. The filter bitmap is combined with the inverse of the delete bitmap so deleted points are excluded like ineligible ones.