Pre-filtering vs. Post-filtering
When you combine metadata filtering with vector search, the filter can be applied before or during the ANN search (pre-filter) or after retrieving candidates (post-filter). The choice affects recall, latency, and how many candidates you need to fetch. This topic compares both approaches and when to use each.
Summary
- With metadata filtering, the filter can be applied before/during ANN (pre-filter) or after retrieval (post-filter); the choice affects recall, latency, and how many candidates to fetch.
- Post-filtering: run vector search, then drop non-matching candidates. Simple and works with any ANN; when the filter is selective you may need to over-fetch (e.g. 10× K) to get K results.
- Pre-filtering: only consider points that pass the filter during traversal; true top-K within the filtered set but harder with ANN. Often implemented via in-bitmap checks or separate indexes per segment.
- When the filter is very selective, pre-filter is usually better; when loose, post-filter is often sufficient and easier.
- Practical tip: measure selectivity of your filters; use post-filter with oversample (e.g. 5–10× K) for quick rollout; move to pre-filter when latency or recall within the filtered set matters.
Post-filtering
Post-filtering: run the vector search over the full index (or a coarse partition), get top-K candidates, then remove any that don’t match the metadata filter. Simple and works with any ANN index, but if the filter is selective (e.g. 1% of data), most of the top-K may be filtered out and you end up with fewer than K results.
To compensate, you often request more candidates (e.g. 10× K) and then filter and take the top K—at the cost of more distance computations and memory. Pipeline: query vector + filter → ANN returns N candidates (N ≥ K, often N = oversample × K) → apply filter to candidates → return top K. Trade-off: implementation is simple and index-agnostic; selective filters require large oversample and higher latency.
Pre-filtering
Pre-filtering: only consider points that pass the filter during graph traversal or cluster search. You never visit ineligible points, so latency can be lower and you get a true top-K within the filtered set. The difficulty is that ANN structures (e.g. HNSW) are built over all vectors; restricting to a subset on the fly can break connectivity and hurt recall.
Implementations use in-bitmap checks during traversal or build separate indexes per segment. When the filter is very selective, pre-filtering is usually better; when it’s loose, post-filtering is often sufficient and easier. Trade-off: pre-filter gives correct top-K within the subset and can be faster when selective, but requires index support (e.g. bitmap integration) and can hurt recall if the filtered set is sparse. Practical tip: use metadata cardinality and selectivity to decide; benchmark both on your workload.
Frequently Asked Questions
What is post-filtering?
Post-filtering runs the vector search over the full index (or a coarse partition), gets top-K candidates, then removes any that don’t match the metadata filter. Simple and works with any ANN index; when the filter is selective you often request more candidates (e.g. 10× K) and then filter to get K results. See metadata filtering basics.
What is pre-filtering?
Pre-filtering only considers points that pass the filter during graph traversal or cluster search. You get a true top-K within the filtered set, but it is harder to implement with ANN because ANN structures are built over all vectors. In-bitmap filtering is a common approach.
When should I use pre-filter vs. post-filter?
When the filter is very selective (e.g. 1% of data), pre-filtering is usually better for latency and recall. When the filter is loose, post-filtering is often sufficient and easier. Consider how metadata cardinality affects query performance.
How do I get enough results when post-filtering?
Request more candidates than K (e.g. 10× K), run the vector search, then filter and take the top K. This increases distance computations and memory; tuning depends on your recall and latency requirements.