← All topics

Indexing Algorithms - HNSW · Topic 69

Navigable Small World (NSW) vs. HNSW

NSW (Navigable Small World) is a single-layer graph: every vector is a node with links to its nearest neighbors, giving small-world connectivity. Search is a greedy walk from an entry point. HNSW extends NSW by adding multiple layers (a hierarchy): the top layer has few nodes and long-range links, and you "drop" layer by layer until the bottom, where all nodes live. So HNSW = NSW + hierarchy; production systems use HNSW for better scaling and recall–latency.

Summary

  • NSW: one graph, greedy search only; hops can grow with n, so query time scales worse. HNSW: top layers = express network → typically O(log n) search cost and better recall–latency at scale.
  • Understanding skip lists and hierarchy explains why HNSW is faster; understanding HNSW ties it together.
  • Build: NSW connects each node to M neighbors on one layer; HNSW adds layer assignment and multi-layer links, so build is more involved but pays off at query time.
  • Production vector DBs use HNSW (or HNSW-like) for graph-based ANN; NSW is mainly of historical and conceptual interest.
  • Trade-off: HNSW uses more memory (multiple layers per node) and build cost than NSW but delivers sublinear query time and better recall for the same M.

NSW: single-layer small-world graph

In NSW, all vectors form one graph. Each node is connected to up to M nearest neighbors (found at build time). The graph has the small-world property: short average path length and high local clustering. At query time, you start at an entry point (e.g. random or fixed) and greedily move to the neighbor closest to the query until no improvement is possible. There are no layers—just one graph. So search is a pure greedy walk; the number of hops can grow with dataset size and the structure of the graph, which limits scalability.

HNSW: NSW plus hierarchy

HNSW keeps the same small-world connectivity idea but adds a hierarchy inspired by skip lists. Each node is assigned to a random maximum layer; the top layer has few nodes and sparse long-range links, and the bottom layer has all nodes. Search starts at the top layer, takes a few greedy steps to get into the right region, then “drops” to the next layer and repeats. On the bottom layer, the search refines and returns the top-k from a candidate set. So HNSW = NSW on each layer, plus the ability to “skip” across the graph via upper layers before doing fine search at the bottom. That yields O(log n) expected hops and better recall–latency at scale.

Query scaling

In NSW, the number of hops to reach a target can grow with the dataset size, so query time tends to scale worse. In HNSW, the top layers act like an express network—you quickly get into the right region—then the bottom layer does the fine search. That yields better scaling: search cost is typically O(log n) in HNSW vs. higher in plain NSW for the same recall. Build logic is also more involved in HNSW (layer assignment, multi-layer link creation) but the payoff is better recall–latency at scale.

When to think about NSW

In practice, NSW is mainly of historical and conceptual interest; production vector DBs use HNSW (or HNSW-like) for graph-based ANN. Understanding NSW helps see why the graph is built the way it is; understanding skip lists and the hierarchy explains why HNSW is faster.

Pipeline: when building an index, HNSW assigns each new node a maximum layer, then connects it to up to M neighbors on each layer from that max down to 0—so each layer is a small-world graph like NSW. At query time, entry point selection starts at the top layer; the hierarchy reduces hops compared to a single NSW graph. Memory consumption of HNSW indexes and limitations of graph-based indexes apply to both; HNSW improves over NSW on query scaling and recall–latency.

Trade-offs and practical tips

Trade-off: HNSW uses more memory (each node stores links on each layer it belongs to) and has higher build cost (searches at each layer during insert) than NSW. In return you get sublinear query time and typically better recall for the same M. Practical tip: when reading or implementing graph-based ANN, think “NSW + hierarchy” to separate the small-world idea from the skip-list-style layers; use HNSW for any serious deployment.

Frequently Asked Questions

Can I use NSW instead of HNSW?

You could for small datasets or teaching; for production, HNSW’s sublinear search and better recall make it the standard choice.

Is NSW simpler to implement?

Yes—one graph, one layer, no level assignment. But query and build don’t scale as well as HNSW.

Does HNSW always beat NSW on recall?

For the same M and similar build effort, HNSW typically gives better recall at lower latency because of the hierarchy.

Where did NSW come from?

It was the precursor to HNSW; the same authors proposed the hierarchical extension to get O(log n) search. See What is HNSW for the full picture.