Basic Fundamentals · Topic 18

What is a “Feature Vector”?

A feature vector is a fixed-length list of numbers that represents an object for machine learning or similarity. In the context of a vector database, the “vector” in each point is typically a feature vector—often an embedding from a model (e.g. text or image encoder). The dimensions of the vector are the “features”; similarity between two items is computed by a distance or similarity metric on their feature vectors.

Summary

A feature vector = fixed-length list of numbers representing an object; dimensions = “features.”
In a VDB, the vector in a point is usually a feature vector (often an embedding); similarity is computed with cosine, L2, or dot product.
Can be dense (e.g. neural embeddings) or sparse (e.g. bag-of-words); VDBs index and search by these vectors in latent space.

Definition

In ML, a feature vector is the numerical representation of one data point: each dimension is a “feature” (hand-crafted or learned). In a vector database, the vector stored in each point is typically such a feature vector—often produced by an embedding model (e.g. transformer, CNN) so that similar inputs map to similar vectors. The dimensions usually aren’t interpretable (they’re learned); what matters is that distance or similarity in this space reflects semantic or structural similarity, which is what makes nearest-neighbor search useful.

Hand-crafted feature vectors (e.g. from domain rules or classical ML feature engineering) have interpretable dimensions (age, price, category) but may not capture complex similarity as well as learned embeddings. In vector DBs, the term “vector” almost always refers to a feature vector, and in modern setups it’s usually an embedding.

Dense vs. sparse feature vectors

Feature vectors can be dense (most dimensions non-zero, as with neural embeddings) or sparse (e.g. bag-of-words, BM25-style). Vector DBs are built to index and search by these vectors: the latent space is the space of feature vectors produced by the embedding model, and nearest-neighbor search finds points whose feature vectors are closest to the query vector. So “feature vector” is the general term; “embedding” usually means a feature vector from a learned model.

Sparse feature vectors are common in traditional IR (term weights); dense vectors dominate in neural retrieval. Some vector DBs support both in a single hybrid index so you can combine semantic (dense) and lexical (sparse) signals in one query.

Relation to vectors and embeddings

When we say a VDB stores “vectors,” we mean it stores feature vectors (often embeddings) and supports queries by vector similarity. So: vector = ordered list of numbers; feature vector = vector used as the object’s representation for ML/similarity; embedding = feature vector from a learned model. All three terms appear in VDB docs; in practice the point’s vector is a feature vector, and in most modern setups it’s an embedding.

Frequently Asked Questions

What is the difference between a feature vector and an embedding?

An embedding is a feature vector produced by a learned model (e.g. neural network) so that similar inputs map to nearby vectors. Hand-crafted features (e.g. [age, price, category_one_hot]) are also feature vectors but not necessarily “embeddings” in the ML sense. In VDBs we mostly use embeddings.

Can I use hand-crafted feature vectors in a vector DB?

Yes. Any fixed-length numerical vector can be stored and queried. Hand-crafted features may not have the same “similarity = meaning” property as embeddings; they’re useful when you have structured attributes and a clear similarity definition.

Do all dimensions in a feature vector have to be meaningful?

For learned embeddings, dimensions are usually not individually interpretable; the whole vector encodes the object. For hand-crafted features, each dimension often has a meaning (e.g. age, count). What matters for the VDB is that the chosen distance metric matches your notion of similarity.

Why is “feature vector” used in vector DB docs?

It’s the standard ML term for “numerical representation of one item.” VDBs are agnostic to how the vector was produced (embedding model, hand-crafted, etc.); they index and search by the vector. “Feature vector” is the generic term; “embedding” is the common case.