Range queries on metadata.
Range queries filter on metadata by numeric or ordered values: e.g. price > 100, created_at BETWEEN '2024-01-01' AND '2024-12-31', or score >= 0.8. They extend metadata filtering beyond equality and are common for filters like “recent items,” “within budget,” or "above a confidence threshold." Support and performance depend on how the engine indexes ordered fields and whether ranges are applied before or after vector search.
Summary
- Range queries filter on numeric or ordered metadata (e.g.
price > 100,created_at BETWEEN ...,score >= 0.8), extending metadata filtering beyond equality. - VDBs may use secondary indexes (B-trees, LSM, columnar) on filtered fields. With pre-filtering the range is applied before/during vector search; with post-filtering after retrieval. Pre-filter with ranges is harder than equality (not a simple bitmap).
- Ranges combine with boolean expressions (e.g. category = ‘X’ AND price < 500). Performance depends on selectivity and secondary indexes; time-based filters can use partitioning or namespaces.
- Trade-off: range indexes add build and storage cost; post-filtering avoids that but may require over-fetch. Selective ranges benefit most from pre-filter.
- Practical tip: index fields used in range predicates; prefer bounded ranges (BETWEEN, <= / >=) when the engine can use them efficiently.
How range filters work
Vector DBs may support range filters via secondary indexes (B-trees, LSM, or columnar scans) on the filtered fields. With pre-filtering, the range is applied before or during vector search so only candidates in range are considered; with post-filtering, results are filtered after retrieval. Pre-filtering with ranges is harder than equality because the candidate set isn’t a simple bitmap—it may require index scans or range indexes. See why pre-filtering is hard for ANN indexes for the structural reasons. Pipeline: parse range predicate → use B-tree/range index or columnar scan to get candidate IDs (or bitmap for discrete buckets) → intersect with vector index traversal or apply after ANN.
Combining ranges and performance
Range filters can be combined with boolean expressions (e.g. category = ‘X’ AND price < 500). Performance depends on selectivity and whether the engine has secondary indexes for the range fields.
For time-based filters (e.g. last 7 days), partitioning by time or using namespaces can help limit the search space. Trade-off: range indexes add build time and storage; post-filtering avoids that but may require requesting more k to get enough results after filtering. Practical tip: index all fields used in range predicates; prefer bounded ranges (BETWEEN, <= / >=) when the engine can use them efficiently.
Frequently Asked Questions
What is a range query on metadata?
Filtering by numeric or ordered values: e.g. price > 100, created_at BETWEEN ..., score >= 0.8. Extends metadata filtering beyond equality for “recent items,” “within budget,” or confidence thresholds.
How do VDBs implement range filters?
Often via secondary indexes (B-trees, LSM, or columnar scans) on the filtered fields. With pre-filtering the range is applied before/during search; with post-filtering after retrieval. See why pre-filtering is hard for ANN for the structural reasons.
Can I combine range with other conditions?
Yes. Use boolean expressions (e.g. category = ‘X’ AND price < 500). Secondary indexes on range fields improve performance.
What about time-based filters?
Partitioning by time or using namespaces can limit the search space for “last 7 days” style filters. Check your VDB for time-partitioning and index support.