Basic Fundamentals · Topic 13

What is a “Point” in a VDB?

A point is the basic unit of storage in a vector database: it is one vector plus a unique ID and optional metadata (payload). When you “insert a point,” you’re adding one vector and its associated identity and attributes to a collection; when you query, the engine finds the nearest points and returns their IDs and payloads.

Summary

A point = one vector + unique ID + optional metadata (payload).
The vector is used for similarity (e.g. from an embedding model); the ID is for updates, deletes, and joining back to your app; metadata enables filtered search and returning context (title, URL, etc.).
Terminology varies (“vector,” “embedding,” “record”); the index (e.g. HNSW) is built over the vectors of all points in the collection for fast queries.

Components of a point

The vector is the feature vector (e.g. from an embedding model) used for similarity. It must match the collection’s dimension and is what the ANN index uses to find nearest neighbors. The ID lets you refer to the point for updates, deletes, or upserts so that re-ingestion with the same ID overwrites the previous point (idempotent). The metadata (payload) can store filters (e.g. category, date, source) so you can do filtered vector search: find nearest neighbors among points that satisfy a condition. It can also store display data (title, URL, snippet) so you can return useful context with each hit instead of only the raw vector.

In RAG and document search, a point often corresponds to one chunk; the ID might be a composite of document ID and chunk index. The payload then stores the chunk text or a reference to it, plus document-level fields for filtering (e.g. source, date). That way each search result can be rendered or passed to the LLM without a separate lookup.

How points are used in queries

When you run a vector query, you provide a query vector and (typically) k. The engine finds the k nearest points by distance or similarity, and returns their IDs, scores, and optionally payloads. If you use filters, the engine may apply pre- or post-filtering so that only points matching the filter are considered or returned. So the point is both the unit of storage and the unit of result.

Results are typically sorted by score (similarity or inverse distance). The application can then use the returned IDs to deduplicate by document, fetch full records from another store, or pass payloads directly to a downstream step (e.g. LLM context).

Terminology across products

Terminology varies by product (e.g. “vector,” “embedding,” “record,” “document”), but the idea is the same: one row-like entity per vector in the VDB. The index (e.g. HNSW) is built over the vectors of all points in the collection so that queries can quickly retrieve the closest points. Some systems support multiple vectors per document (multiple points or a multi-vector point); that’s an extension of the basic point model.

Frequently Asked Questions

Do I have to provide an ID for every point?

Most vector DBs require a unique ID (you provide it or the system auto-generates). Your own ID is useful for updates, deletes, and idempotent re-ingestion (e.g. after model drift).

Can I store the original text or image in the point?

Yes, as metadata (payload). Many systems let you store JSON or specific fields so you can return title, URL, or raw text with search results. See storing payloads alongside vectors and raw data vs. only vectors.

What happens if I insert two points with the same ID?

Typically upsert: the new point replaces the old one (same collection). That’s how you update a vector or metadata without leaving duplicates. Behavior may vary by product; check docs for upsert semantics.

Is a point the same as a document in search engines?

Conceptually similar: a document often has an ID, a representation used for ranking (here, the vector), and fields (here, metadata). In vector DBs the ranking is by nearest neighbor; in classic search it’s by keyword or other signals.