← All topics

Filtering & Querying · Topic 132

Range queries on metadata.

Range queries filter on metadata by numeric or ordered values: e.g. price > 100, created_at BETWEEN '2024-01-01' AND '2024-12-31', or score >= 0.8. They extend metadata filtering beyond equality and are common for filters like “recent items,” “within budget,” or "above a confidence threshold." Support and performance depend on how the engine indexes ordered fields and whether ranges are applied before or after vector search.

Summary

  • Range queries filter on numeric or ordered metadata (e.g. price > 100, created_at BETWEEN ..., score >= 0.8), extending metadata filtering beyond equality.
  • VDBs may use secondary indexes (B-trees, LSM, columnar) on filtered fields. With pre-filtering the range is applied before/during vector search; with post-filtering after retrieval. Pre-filter with ranges is harder than equality (not a simple bitmap).
  • Ranges combine with boolean expressions (e.g. category = ‘X’ AND price < 500). Performance depends on selectivity and secondary indexes; time-based filters can use partitioning or namespaces.
  • Trade-off: range indexes add build and storage cost; post-filtering avoids that but may require over-fetch. Selective ranges benefit most from pre-filter.
  • Practical tip: index fields used in range predicates; prefer bounded ranges (BETWEEN, <= / >=) when the engine can use them efficiently.

How range filters work

Vector DBs may support range filters via secondary indexes (B-trees, LSM, or columnar scans) on the filtered fields. With pre-filtering, the range is applied before or during vector search so only candidates in range are considered; with post-filtering, results are filtered after retrieval. Pre-filtering with ranges is harder than equality because the candidate set isn’t a simple bitmap—it may require index scans or range indexes. See why pre-filtering is hard for ANN indexes for the structural reasons. Pipeline: parse range predicate → use B-tree/range index or columnar scan to get candidate IDs (or bitmap for discrete buckets) → intersect with vector index traversal or apply after ANN.

Combining ranges and performance

Range filters can be combined with boolean expressions (e.g. category = ‘X’ AND price < 500). Performance depends on selectivity and whether the engine has secondary indexes for the range fields.

For time-based filters (e.g. last 7 days), partitioning by time or using namespaces can help limit the search space. Trade-off: range indexes add build time and storage; post-filtering avoids that but may require requesting more k to get enough results after filtering. Practical tip: index all fields used in range predicates; prefer bounded ranges (BETWEEN, <= / >=) when the engine can use them efficiently.

Frequently Asked Questions

What is a range query on metadata?

Filtering by numeric or ordered values: e.g. price > 100, created_at BETWEEN ..., score >= 0.8. Extends metadata filtering beyond equality for “recent items,” “within budget,” or confidence thresholds.

How do VDBs implement range filters?

Often via secondary indexes (B-trees, LSM, or columnar scans) on the filtered fields. With pre-filtering the range is applied before/during search; with post-filtering after retrieval. See why pre-filtering is hard for ANN for the structural reasons.

Can I combine range with other conditions?

Yes. Use boolean expressions (e.g. category = ‘X’ AND price < 500). Secondary indexes on range fields improve performance.

What about time-based filters?

Partitioning by time or using namespaces can limit the search space for “last 7 days” style filters. Check your VDB for time-partitioning and index support.