Filtering & Querying · Topic 132

Range queries on metadata.

Range queries filter on metadata by numeric or ordered values: e.g. price > 100, created_at BETWEEN '2024-01-01' AND '2024-12-31', or score >= 0.8. They extend metadata filtering beyond equality and are common for filters like “recent items,” “within budget,” or "above a confidence threshold." Support and performance depend on how the engine indexes ordered fields and whether ranges are applied before or after vector search.

Summary

Range queries filter on numeric or ordered metadata (e.g. price > 100, created_at BETWEEN ..., score >= 0.8), extending metadata filtering beyond equality.
VDBs may use secondary indexes (B-trees, LSM, columnar) on filtered fields. With pre-filtering the range is applied before/during vector search; with post-filtering after retrieval. Pre-filter with ranges is harder than equality (not a simple bitmap).
Ranges combine with boolean expressions (e.g. category = ‘X’ AND price < 500). Performance depends on selectivity and secondary indexes; time-based filters can use partitioning or namespaces.
Trade-off: range indexes add build and storage cost; post-filtering avoids that but may require over-fetch. Selective ranges benefit most from pre-filter.
Practical tip: index fields used in range predicates; prefer bounded ranges (BETWEEN, <= / >=) when the engine can use them efficiently.

How range filters work

Vector DBs may support range filters via secondary indexes (B-trees, LSM, or columnar scans) on the filtered fields. With pre-filtering, the range is applied before or during vector search so only candidates in range are considered; with post-filtering, results are filtered after retrieval. Pre-filtering with ranges is harder than equality because the candidate set isn’t a simple bitmap—it may require index scans or range indexes. See why pre-filtering is hard for ANN indexes for the structural reasons. Pipeline: parse range predicate → use B-tree/range index or columnar scan to get candidate IDs (or bitmap for discrete buckets) → intersect with vector index traversal or apply after ANN.

Combining ranges and performance

Range filters can be combined with boolean expressions (e.g. category = ‘X’ AND price < 500). Performance depends on selectivity and whether the engine has secondary indexes for the range fields.

For time-based filters (e.g. last 7 days), partitioning by time or using namespaces can help limit the search space. Trade-off: range indexes add build time and storage; post-filtering avoids that but may require requesting more k to get enough results after filtering. Practical tip: index all fields used in range predicates; prefer bounded ranges (BETWEEN, <= / >=) when the engine can use them efficiently.

Frequently Asked Questions

What is a range query on metadata?

Filtering by numeric or ordered values: e.g. price > 100, created_at BETWEEN ..., score >= 0.8. Extends metadata filtering beyond equality for “recent items,” “within budget,” or confidence thresholds.

How do VDBs implement range filters?

Often via secondary indexes (B-trees, LSM, or columnar scans) on the filtered fields. With pre-filtering the range is applied before/during search; with post-filtering after retrieval. See why pre-filtering is hard for ANN for the structural reasons.

Can I combine range with other conditions?

Yes. Use boolean expressions (e.g. category = ‘X’ AND price < 500). Secondary indexes on range fields improve performance.

What about time-based filters?

Partitioning by time or using namespaces can limit the search space for “last 7 days” style filters. Check your VDB for time-partitioning and index support.