← All topics

Filtering & Querying · Topic 138

Sorting by metadata vs. sorting by distance.

Vector search naturally returns results ordered by distance (or similarity)—the primary ranking. Sometimes you want to order instead (or secondarily) by metadata: e.g. “nearest neighbors, then by date descending” or “filter by category, then sort by price.” Sorting by distance is what the index is built for; sorting by metadata requires either post-processing or indexes that support it. Combined order-by (distance and metadata) is supported to varying degrees across VDBs.

Summary

  • Vector search returns results ordered by distance (similarity). Ordering by metadata (e.g. date, price) requires post-processing or indexes that support it; e.g. “nearest then by date” or “filter by category, sort by price.”
  • Sort by distance only: index returns list already ordered. Sort by metadata only (with filter): metadata query with optional vector filter/score; sort key may need a secondary index. Distance then metadata (tie-break): get top-k by distance, then sort by metadata in app.
  • “Metadata first, then distance” is harder: filter by metadata then run vector search, or run vector search then sort by metadata (can reorder away from pure distance). Some systems support combined order-by; see range queries and pagination in ANN.
  • Pipeline: get top-k by distance then optionally sort by metadata in app; or filter by metadata then vector search. Trade-off: metadata sort can reorder away from distance; combined order-by depends on engine. Practical tip: use distance-first then metadata tie-break when possible.

Sort options

Sort by distance only: The ANN index returns a list already ordered by distance; no extra sort step. Sort by metadata only (with filter): You’re effectively running a filtered metadata query and optionally using the vector to filter or score; the sort key is the metadata field (e.g. created_at, price), which may require a secondary index. Sort by distance then by metadata (tie-break): After getting top-k by distance, sort ties (or the whole list) by a metadata field in application code.

“Sort by metadata first, then by distance within that” is harder: the vector index doesn’t order by arbitrary metadata. You’d typically filter by metadata (e.g. category = X), then run vector search on the filtered set, or run vector search and then sort the results by metadata (which can reorder away from pure distance ranking). Some systems support a single “order by” that combines distance and one or more fields; implementation varies. See range queries on metadata and pagination in ANN for interaction with deep pagination and filters. Trade-off: metadata sort can reorder away from distance ranking; combined order-by depends on engine support. Practical tip: use distance-first then metadata tie-break when possible.

Frequently Asked Questions

Can I sort vector results by metadata?

Yes, but it’s not what the index is built for. Options: (1) Get top-k by distance, then sort by metadata in application code (tie-break or full reorder). (2) Filter by metadata then run vector search. (3) Some VDBs support a combined “order by” (distance + fields). See range queries on metadata.

What if I only want to sort by metadata?

You’re effectively running a filtered metadata query; the sort key is the metadata field (e.g. created_at, price), which may require a secondary index. The vector can still be used to filter or score.

How does this interact with pagination?

Sorting by metadata after vector search can reorder results, so “page 2” may not align with distance-based pages. See pagination in ANN search for why deep pagination is difficult and how cursor/over-fetch strategies help.

Which systems support combined order by?

Implementation varies by VDB. Some support a single “order by” that combines distance and one or more metadata fields. Check your engine’s docs for sort and metadata filtering support.