Indexing Algorithms - HNSW · Topic 63

What is a Skip List and how does HNSW use it?

A skip list is a probabilistic data structure for ordered data: a linked list plus "express" layers with fewer elements. Search starts at the top layer, moves right until going further would overshoot, then drops a layer and repeats. Expected search cost is logarithmic in the list size. HNSW borrows this idea: not a literal skip list, but a hierarchy of graphs where upper layers have fewer nodes and long-range links.

Summary

A skip list uses multiple levels of linked lists so search can “skip” over many elements on upper levels, giving O(log n) expected search; HNSW borrows this multi-level idea for graph-based ANN.
In HNSW each vector gets a random maximum layer (fewer nodes at higher layers). Top layer = fast coarse navigation; bottom layer = all nodes and final greedy search—sublinear query time vs. single-layer NSW.
“Skip list” in HNSW means: multi-level structure + coarse-then-fine search, while preserving small-world connectivity at each level.
Layer assignment is typically probabilistic (e.g. 1/M per extra layer); entry-point selection starts at the top layer so the first hops cover large graph distance with few steps.
Trade-off: more layers improve navigation speed but increase memory (nodes store more links) and build cost; in practice the expected number of layers grows slowly with dataset size.

What a skip list is (classic structure)

A skip list is a probabilistic data structure for ordered key lookup. You have a sorted linked list (level 0) plus one or more “express” layers: each node randomly promotes to the next layer with some probability, so higher layers have fewer nodes. To search, you start at the top layer and move right until the next step would overshoot the target; then you drop down one layer and repeat. Expected search cost is O(log n) in the number of elements.

The key idea is coarse-to-fine: the top level lets you skip large stretches of the list; lower levels add detail. HNSW does not store ordered keys, but it reuses this coarse-to-fine idea over a graph of vectors.

Layer assignment and search

In HNSW, each vector is assigned to a random maximum layer (e.g. with exponential decay: fewer nodes at higher layers). The top layer has very few nodes—like the top of a skip list—so search quickly gets into the right region. When you “drop” to the next layer, you have more nodes and more local links, so you refine the position. At the bottom layer, all vectors exist and you do the final greedy or beam search. So “skip list” in HNSW means: multi-level structure + fast coarse navigation + fine search at the bottom.

The probability of assigning a node to layer ℓ is often 1/M^ℓ (or similar), so the expected number of nodes per layer decreases exponentially. That keeps the top layers sparse and the bottom layer complete, which is exactly what you need for fast descent.

Why it beats single-layer NSW

In a single-layer NSW graph, search is purely greedy on one graph: you start at an entry point and follow nearest-neighbor edges until you cannot improve. That can take many hops, especially in large graphs. HNSW adds hierarchy: you start at the top layer, take a few long jumps to get near the query region, then step down and repeat. The hierarchy reduces the number of distance computations compared to a single-layer NSW graph, while still preserving small-world connectivity at each level.

So HNSW achieves sublinear query time not only because of small-world structure, but because the top layers act as an express network—analogous to the express lanes in a skip list—before you do the fine-grained search on the bottom.

Pipeline: from insertion to query

At insert time, a new vector gets a random maximum layer. It is linked to neighbors (up to M) on each layer from its max layer down to 0, using the same small-world connection rule. At query time, the pipeline is: pick an entry point (often from the top layer), greedily move to the best neighbor on the current layer; when no better neighbor exists, step down to the next layer and continue; on the bottom layer, run greedy or beam search (e.g. with efSearch candidates) to collect top-k. The skip-list-style layers reduce the number of hops before the bottom-layer refinement.

Trade-offs and practical tips

More layers improve the potential for fast navigation but increase memory (each node stores links on each layer it belongs to) and slightly increase build cost. The expected maximum layer grows as O(log n), so in practice the layer count stays manageable. If the top layer has too few nodes, entry-point choice matters more; if it has too many, you lose the “express” benefit.

Practical tip: use the default layer-assignment scheme (e.g. 1/M) unless you have a specific need. For very large datasets, ensure the top layer is still sparse enough that a few hops from the entry point can reach the right region; otherwise consider a larger M or checking entry-point selection.

Frequently Asked Questions

How is the maximum layer chosen for a new node?

Usually at random with exponential decay (e.g. probability 1/M for each additional layer). So most nodes are only on the bottom layer; few appear on many layers.

Is HNSW literally a skip list?

No. Skip lists are for totally ordered keys. HNSW is a hierarchy of graphs over vectors; the “skip” idea is the multi-level, coarse-to-fine traversal.

Why not use only the bottom layer?

Then you have plain NSW: search may take many hops. The top layers reduce hops by quickly narrowing the region before descending.

Does layer count grow with dataset size?

In theory the number of layers is O(log n). In practice it’s capped by the random assignment; expected max layer grows slowly with n.