Filtering & Querying · Topic 127

Pre-filtering vs. Post-filtering

When you combine metadata filtering with vector search, the filter can be applied before or during the ANN search (pre-filter) or after retrieving candidates (post-filter). The choice affects recall, latency, and how many candidates you need to fetch. This topic compares both approaches and when to use each.

Summary

With metadata filtering, the filter can be applied before/during ANN (pre-filter) or after retrieval (post-filter); the choice affects recall, latency, and how many candidates to fetch.
Post-filtering: run vector search, then drop non-matching candidates. Simple and works with any ANN; when the filter is selective you may need to over-fetch (e.g. 10× K) to get K results.
Pre-filtering: only consider points that pass the filter during traversal; true top-K within the filtered set but harder with ANN. Often implemented via in-bitmap checks or separate indexes per segment.
When the filter is very selective, pre-filter is usually better; when loose, post-filter is often sufficient and easier.
Practical tip: measure selectivity of your filters; use post-filter with oversample (e.g. 5–10× K) for quick rollout; move to pre-filter when latency or recall within the filtered set matters.

Post-filtering

Post-filtering: run the vector search over the full index (or a coarse partition), get top-K candidates, then remove any that don’t match the metadata filter. Simple and works with any ANN index, but if the filter is selective (e.g. 1% of data), most of the top-K may be filtered out and you end up with fewer than K results.

To compensate, you often request more candidates (e.g. 10× K) and then filter and take the top K—at the cost of more distance computations and memory. Pipeline: query vector + filter → ANN returns N candidates (N ≥ K, often N = oversample × K) → apply filter to candidates → return top K. Trade-off: implementation is simple and index-agnostic; selective filters require large oversample and higher latency.

Pre-filtering

Pre-filtering: only consider points that pass the filter during graph traversal or cluster search. You never visit ineligible points, so latency can be lower and you get a true top-K within the filtered set. The difficulty is that ANN structures (e.g. HNSW) are built over all vectors; restricting to a subset on the fly can break connectivity and hurt recall.

Implementations use in-bitmap checks during traversal or build separate indexes per segment. When the filter is very selective, pre-filtering is usually better; when it’s loose, post-filtering is often sufficient and easier. Trade-off: pre-filter gives correct top-K within the subset and can be faster when selective, but requires index support (e.g. bitmap integration) and can hurt recall if the filtered set is sparse. Practical tip: use metadata cardinality and selectivity to decide; benchmark both on your workload.

Frequently Asked Questions

What is post-filtering?

Post-filtering runs the vector search over the full index (or a coarse partition), gets top-K candidates, then removes any that don’t match the metadata filter. Simple and works with any ANN index; when the filter is selective you often request more candidates (e.g. 10× K) and then filter to get K results. See metadata filtering basics.

What is pre-filtering?

Pre-filtering only considers points that pass the filter during graph traversal or cluster search. You get a true top-K within the filtered set, but it is harder to implement with ANN because ANN structures are built over all vectors. In-bitmap filtering is a common approach.

When should I use pre-filter vs. post-filter?

When the filter is very selective (e.g. 1% of data), pre-filtering is usually better for latency and recall. When the filter is loose, post-filtering is often sufficient and easier. Consider how metadata cardinality affects query performance.

How do I get enough results when post-filtering?

Request more candidates than K (e.g. 10× K), run the vector search, then filter and take the top K. This increases distance computations and memory; tuning depends on your recall and latency requirements.