Basic Fundamentals · Topic 2

Definition of a Vector Database vs. a Relational Database

A relational database stores rows and columns and answers queries with exact matches, range filters, and joins. A vector database stores vectors (e.g. embeddings) and answers queries by nearest neighbor or similarity—finding items “close” in meaning or structure, not just equal in value.

Summary

Relational DB: structured rows/columns, B-tree–style indexes, exact match and range queries, SQL, transactions.
Vector DB: stores vectors (and optional metadata), answers by similarity / nearest neighbor, uses ANN indexes.
B-trees and one-dimensional order don’t support efficient vector similarity search; vector DBs use graph or cluster-based indexes.
Relational = structured data; vector = unstructured or semi-structured data turned into embeddings, powering semantic search and RAG, recommendations, chatbots.
In practice the two are often used together: relational for business data, vector DB for embeddings and vector queries.

How relational databases work

In a relational database, you store data in tables: rows and columns with a fixed or evolving schema. You index columns with structures like B-trees to speed up exact lookups (“WHERE id = 5”), range scans (“WHERE price BETWEEN 10 AND 20”), and sorted order. Equality and order are well-defined on scalar or low-dimensional values. Transactions (ACID) ensure consistency; SQL gives a declarative way to join tables and filter. This model excels for structured business data: users, orders, inventory, logs with clear fields.

The relational model has dominated enterprise data for decades because it fits naturally with how many business processes work: entities with fixed attributes, relationships between them, and the need for consistent updates and complex queries across multiple tables. Aggregations, reporting, and transactional integrity are all first-class concerns in relational systems.

How vector databases work

In a vector database, the main operation is similarity search: given a query vector, return the stored vectors that are closest under a distance metric (e.g. cosine similarity or L2). That kind of query is not what relational indexes are built for—hence the need for dedicated vector database architecture and indexes (e.g. HNSW, IVF) that exploit high-dimensional geometry. You typically store each vector with an id and optional metadata in a collection or index, then query by “vectors like this one.”

Vector databases are built to scale similarity search to millions or billions of vectors while keeping latency low. They often combine specialized indexes, efficient distance computation (including hardware acceleration), and optional metadata filtering so that you can narrow the search space before or after running the vector comparison. The API is usually centered on “insert vectors” and “query by vector” rather than SQL’s select/join/filter.

Why B-trees don’t fit vectors

In a B-tree, you compare two values and go left or right. That assumes a clear, one-dimensional ordering. A vector has hundreds or thousands of dimensions; you can impose an order (e.g. lexicographic), but that order does not match similarity. Two vectors that are close in cosine or L2 can be far apart in lexicographic order. So an index that relies on one-dimensional order cannot prune the search space the way you need for fast ANN search.

On top of that, high-dimensional space behaves badly: simple axis-aligned partitions don’t create tight regions, and many points end up near boundaries. Vector databases avoid this by using indexes designed for similarity—graph-based or cluster-based—that exploit geometry and distance rather than scalar order. That’s why you need a system built for vectors, not a relational engine with a vector column bolted on, when scale and latency matter.

Structured vs. unstructured data

Relational databases excel at structured data: fixed schemas, transactions, and declarative SQL. Vector databases target unstructured or semi-structured data that has been turned into vectors—text, images, behavior. That powers semantic search, recommendations, chatbots, and RAG. In practice, the two are often used together: a relational system holds business data and metadata, while a vector database holds embeddings and serves vector queries. So: relational = exact match and structure; vector = similarity and meaning.

When you have both structured records (e.g. product catalog, user profiles) and unstructured content (e.g. descriptions, reviews, images), the typical pattern is to keep the structured data in a relational store and the embeddings in a vector store. The application joins them by ID or key when building responses, so each system does what it does best.

Frequently Asked Questions

Can I use a relational database for vector search?

Some relational systems (e.g. PostgreSQL with pgvector) add vector types and indexes. They can work for small or medium scale. For very large scale or lowest latency, purpose-built vector databases are optimized for ANN and vector query patterns. See Vector Database vs. Vector-capable Relational Database for the full debate.

Do I still need a relational database if I have a vector database?

Often yes. Relational DBs are better for user accounts, permissions, transactional records, and joins. Vector DBs are for similarity search over embeddings. Many apps use both: relational for “who/what” and vector for “find similar.”

What about ACID and transactions in a vector database?

Vector DBs vary. Some offer single-document or single-collection guarantees; fewer offer full multi-key ACID. For details see data persistence guarantees and ACID compliance in VDBs.

Can I run SQL on a vector database?

Some vector DBs support SQL-like or query-language interfaces for metadata filtering and management. The core operation—similarity search—is still “nearest neighbor,” not relational join. Check your vendor’s docs for supported query forms.