Filtering & Querying · Topic 139

Namespaces and Partitioning.

Namespaces (or partitions) are logical subdivisions of vectors within a collection. They let you isolate data by tenant, environment, time window, or use case so that a query runs only against the chosen namespace. For example: one namespace per customer in a SaaS app, or “2024” vs “2023” for time-based segmentation. This supports multi-tenant isolation and smaller, faster search spaces. Implementation can be physical (separate indexes) or logical (mandatory filter on a namespace field).

Summary

Namespaces (or partitions) are logical subdivisions within a collection; isolate data by tenant, environment, time window, or use case so queries run only against the chosen namespace. Supports multi-tenant isolation and smaller, faster search.
Implementation: separate physical indexes (full isolation) vs. logical filter (e.g. mandatory namespace_id) with one index and metadata or pre-filtering. Physical namespaces can have different schemas per namespace.
Benefits: better recall and latency per namespace; tenant separation; drop/archive a namespace without scanning the whole collection. Trade-offs: many small namespaces increase overhead; cross-namespace search requires querying multiple and merging. See sharding.
Pipeline: create namespace or partition → ingest with namespace id → query with namespace scope. Practical tip: use logical namespaces for many small tenants; physical when you need strict isolation or different schemas.

How namespaces are implemented

Implementation varies: some vector DBs implement namespaces as separate physical indexes (full isolation, clear boundaries); others as a logical filter (e.g. a mandatory metadata field like namespace_id) so that every query includes a namespace condition. Physical namespaces can have different schemas or settings per namespace; logical partitioning often shares one index and uses metadata filtering or pre-filtering to restrict the search. Pipeline: create namespace or partition, ingest with namespace id, query with namespace scope so only that subset is searched.

Benefits and trade-offs

Benefits: smaller index per namespace can mean better recall and lower latency; clear tenant separation for security and billing; and the ability to drop or archive a namespace (e.g. old time range) without scanning the whole collection. Trade-offs: too many small namespaces can increase operational overhead; cross-namespace search requires querying multiple namespaces and merging. See sharding for how partitioning relates to distributing data across nodes. Practical tip: use logical namespaces for many small tenants; physical when you need strict isolation or different schemas per namespace.

Frequently Asked Questions

What are namespaces in a vector DB?

Logical subdivisions of vectors within a collection. You isolate data by tenant, environment, or time window so a query runs only against the chosen namespace. Supports multi-tenant isolation and smaller search spaces.

Physical vs. logical namespaces?

Physical: separate indexes per namespace (full isolation, different schemas possible). Logical: one index with a mandatory field (e.g. namespace_id); every query includes a namespace condition via metadata filtering or pre-filtering.

When should I use namespaces?

For multi-tenant SaaS (one namespace per customer), time-based segmentation (e.g. 2024 vs. 2023), or environment separation. Smaller index per namespace can mean better recall and lower latency; you can also drop or archive a namespace without scanning the whole collection.

How does this relate to sharding?

Namespaces are logical (or physical) partitioning within a collection. Sharding distributes data across nodes. A namespace can be sharded, or multiple namespaces can live on the same shards; see your VDB docs.