Basic Fundamentals · Topic 5

The concept of “Embedding” simplified

An embedding is a vector that represents a piece of data—a sentence, an image, a user—so that similar things end up close together in a shared space. The model that produces embeddings is trained so that “closeness” in this space reflects similarity in meaning or structure, which is what makes semantic search and vector databases useful.

Summary

An embedding is a vector produced by a model so that similar inputs map to nearby vectors.
You don’t hand-write embeddings; models (e.g. transformers, CNNs/ViTs) compute them; similarity is measured with cosine or L2.
Embeddings are usually dense and live in a latent space; normalizing (unit length) is common.
Stored in a vector database, embeddings enable “vectors near this one” → similar content; bridge between unstructured data and meaning-based search.

What an embedding is

An embedding is a fixed-size vector that stands for a piece of data—a sentence, a paragraph, an image, a user ID, etc. The crucial property: if two inputs are semantically or structurally similar, their embedding vectors are close under a distance like cosine similarity or Euclidean distance. You don’t hand-write embeddings; a model (e.g. a neural network) computes them. For text, models like transformers turn a passage into a fixed-size vector. For images, CNNs or vision transformers do the same. So you can store embeddings in a vector database and query by “vectors near this one” to get similar content.

The dimensionality of the embedding (e.g. 384, 768, 1536) is fixed by the model. Higher dimensions can capture more nuance but use more storage and compute; the right choice depends on your corpus size, recall requirements, and latency budget. Embeddings are typically dense (most dimensions non-zero), unlike sparse term-based representations used in classic IR.

Latent space and normalization

Embeddings are usually dense (most dimensions non-zero) and live in a latent space—a space learned by the model, where the dimensions aren’t hand-named but capture abstract features. Normalizing them (e.g. to unit length) is common so that similarity can be measured with cosine or dot product without magnitude dominating. In short: embedding = turning data into a vector so that similarity in the vector space mirrors similarity in the real world; that’s the bridge between raw unstructured data and fast, meaning-based search in a VDB.

The latent space is shared: all items embedded with the same model live in the same space, so you can compare text to text, image to image, or—with multimodal models—text to image. Unit-length normalization makes cosine similarity and dot product equivalent and stabilizes training and retrieval.

How embeddings are used in a VDB

In a typical flow, you embed your corpus (e.g. document chunks) and upsert the vectors into a collection with optional metadata. At query time you embed the query with the same model and run ANN or exact nearest-neighbor search. The choice of embedding model and its dimensionality affects recall, memory use, and latency. For domain-specific data, fine-tuning the model can improve relevance.

Index and query must use the same model and normalization so that distances are comparable. If you upgrade or change the model, you typically need to re-embed the corpus and rebuild the index. Metrics like recall@k and latency (p50, p99) help you evaluate whether an embedding model is right for your workload.

Cross-encoders vs. bi-encoders

Most VDB workflows use bi-encoders: query and documents are encoded separately into vectors, and similarity is computed between vectors (e.g. cosine). Cross-encoders take query and document together and output a single relevance score; they’re more accurate but too slow to run against every document. So the usual pattern is: bi-encoder for retrieval (vector DB), then optionally re-rank a small set with a cross-encoder. See cross-encoders vs. bi-encoders for more.

Bi-encoders scale because you encode each document once and each query once; similarity is just a distance between two vectors. Cross-encoders require the query and each candidate document to be processed together, so they’re used as a second stage on a small candidate set (e.g. top 20 or 100 from the vector DB) to improve precision without sacrificing speed.

Frequently Asked Questions

What is the difference between a vector and an embedding?

An embedding is a vector produced by a model so that “similar” inputs map to nearby vectors. So every embedding is a vector; not every vector is an embedding (e.g. hand-crafted feature vectors).

Do I need the same embedding model for indexing and querying?

Yes. Query and corpus must use the same model (and usually same normalization) so that distances in the latent space are comparable. If you change models, you typically need to re-embed and re-index.

Can I use embeddings for images and text together?

Yes, with multi-modal embedding (e.g. CLIP): one model embeds both images and text into the same space so you can search images by text or vice versa.

How do I choose embedding dimensionality?

Larger dimension usually means more expressiveness but more memory and compute. See impact of embedding model dimensionality on VDB performance and choosing the right embedding model.