Indexing Algorithms - HNSW · Topic 71

Entry point selection in HNSW

In HNSW, every search starts from an entry point—a single node, usually in the top layer of the graph. How that entry point is chosen affects how many hops the greedy search needs and whether the algorithm can reach the true nearest neighbors.

Summary

Often a fixed node (e.g. first inserted) or a dedicated entry vertex. Bad entry can force long traversals and hurt recall and latency. Some implementations use multiple entry points or a coarse structure to suggest a start.
Keep entry up to date when graph is built incrementally or when nodes are deleted so queries don’t start in obsolete regions.
Entry is typically on the top layer so the first hops use long-range links to “zoom” toward the query region before descending.
Pipeline: each query starts at the entry → greedy steps on top layer → drop to lower layers → refine on bottom; entry quality affects how many steps are needed.
Trade-off: fixed entry is simple and fast; per-query entry selection (e.g. via coarse index) can improve recall but adds cost and complexity.

Why entry point matters

Typically the entry point is a fixed node stored at index build time (e.g. the first inserted element or a dedicated “entry” vertex). For a small graph this is fine; for large or skewed data, a bad entry point can force long traversals and hurt recall and latency. The small-world property helps: even from a single entry, long-range links on the top layer usually allow the search to reach a relevant region in a few hops. But if the entry is in a “bad” part of the graph or the data has strong clusters, more hops or a larger efSearch may be needed.

Some implementations allow multiple entry points or use a lightweight auxiliary structure (e.g. a few centroids or a small coarse index) to suggest a good starting node per query, at the cost of extra lookups and complexity.

Top layer and maintenance

Because the top layer has few nodes and long-range links, starting there lets the search quickly “zoom” toward the right region before descending to lower layers. The entry point is therefore usually chosen from the set of nodes that appear on the top layer. Keeping the entry point (or set of candidates) up to date when the graph is built incrementally or when nodes are deleted is important so that queries do not start in obsolete or disconnected regions.

When the entry node is deleted, implementations must update the entry to another live node (e.g. first non-deleted in top layer) or use soft deletes so the node remains traversable but is skipped in results.

Pipeline: from query to first hop

The query pipeline: (1) resolve entry point (fixed or from auxiliary structure); (2) start greedy search from that node on the top layer; (3) take greedy steps until no improvement, then step down to the next layer; (4) repeat until the bottom layer and return top-k from the candidate set. The entry determines the starting region; the rest of the graph and efSearch determine how well the search can refine from there.

Trade-offs and practical tips

Trade-off: a fixed entry is simple, has no per-query cost, and works well when the graph is well connected. Per-query entry selection can improve recall and reduce hops when data is clustered or the graph is large, but adds latency and implementation cost. Practical tip: for most deployments, a fixed entry (e.g. first inserted node on the top layer) is sufficient; if you see high variance in recall or latency, consider multiple entry points or a coarse selector.

Visualizing the layers of an HNSW index shows how few nodes sit on the top layer and how the entry fits in; how HNSW handles clusters of data describes when a single entry may be less ideal. Entry point selection interacts with efSearch—higher efSearch can partly compensate for a suboptimal entry by exploring more candidates.

Frequently Asked Questions

Can the entry point be chosen per query?

In standard HNSW it’s fixed. Some variants use a small auxiliary index (e.g. few clusters) to pick a better start per query at extra cost.

What if the entry point is deleted?

Implementations must update the entry to another live node (e.g. first non-deleted in top layer) or use soft deletes so the node remains but is skipped in results.

Does entry point affect recall?

Yes. A start far from the query’s neighborhood may require more hops or a larger efSearch to reach true neighbors; in the worst case recall can drop.

Why not start at a random node?

Random could be very far from the query and waste hops. A fixed entry plus long-range top-layer links usually gets you “close enough” quickly.