Indexing Algorithms - HNSW · Topic 64

Defining “M” (number of bi-directional links) in HNSW

In HNSW, M is the maximum number of bi-directional links (neighbors) each node can have in the graph. When a new node is inserted, the algorithm finds its M nearest neighbors in the current graph and creates edges to them (and from them back to the new node, hence bi-directional). M is a main lever for the trade-off between memory, recall, and build cost.

Summary

M is the maximum number of bi-directional neighbors per node; it directly sets graph degree and drives memory, build time, and query-time expansion.
Higher M → denser graph, better recall, more memory and build time. Lower M → sparser graph, faster build/query, possible recall drop. Typical range 4–64.
Tune together with efConstruction and efSearch; pruning strategies cap the number of links after selection, and the recall–latency trade-off depends on M and efSearch.
Some implementations use a smaller M0 on upper layers to keep top layers sparser for fast navigation while allowing a larger M on the bottom layer.
Pipeline: at insert, find candidate neighbors (via efConstruction), select up to M, create bi-directional edges; at query, each hop expands up to M neighbors per node.

What M controls

M caps how many neighbors each node can store in the HNSW graph on each layer where the node appears. When a new vector is inserted, the algorithm searches for its approximate nearest neighbors in the current graph, then adds edges to (and from) at most M of them. Because edges are bi-directional, increasing M increases both out-degree and in-degree: more paths for search, but more memory and more work per hop.

M is fixed at index build time and applies to every layer (unless the implementation uses a separate M0 for upper layers). It does not change at query time; query-time behavior is tuned by efSearch instead.

Effect of M on graph and search

Higher M means more connections per node: the graph is denser, so search can follow more paths and typically achieves better recall and more robust traversal. But it also means more memory (each node stores more neighbor IDs), longer index build time (more candidates to consider when connecting new nodes), and often more distance computations during search.

Lower M saves memory and speeds up build and query, but the graph is sparser and recall can drop if there aren’t enough “bridge” links between clusters. In high dimensions or with many natural clusters, too low an M can leave some regions poorly connected, so greedy search may not reach the true nearest neighbors.

Pipeline: build and query

At build time, for each new node the algorithm runs an approximate search to find candidate neighbors, then selects up to M of them (often with a pruning strategy to favor diverse or close neighbors) and creates bi-directional edges. The candidate set size is controlled by efConstruction; M is the cap on how many edges are actually kept. At query time, when the search visits a node, it can expand up to M neighbors at that layer—so higher M increases the branching factor and the number of distance computations per hop.

Typical values and tuning

Typical values are in the range 4–64 depending on dimensionality and dataset size. Low dimensions (e.g. < 100) often work well with M 12–24; high dimensions or difficult datasets may need M 32–64 for good recall. Start with a default (e.g. 16) and measure recall and latency; increase M if recall is below target and you can afford the memory and build cost.

Tuning M often goes together with efSearch and pruning strategies to hit a target recall–latency curve. Increasing efConstruction can improve the quality of the M neighbors chosen at build time, which can let you get away with a slightly lower M for the same recall—or improve recall for the same M.

Trade-offs and practical tips

Trade-off summary: M balances connectivity (recall) against memory and compute. Too low M hurts recall and can create “bottleneck” nodes; too high M increases memory footprint and query latency without always improving recall much. Practical tip: profile memory consumption of HNSW indexes so you know how much each increment of M costs; then fix a recall target and choose the smallest M (and efSearch) that achieves it to keep latency and cost down.

Pipeline summary: set M at index creation (e.g. 16 or 32); combine with efConstruction so that the candidate pool for selecting M neighbors is large enough. At query time, efSearch controls exploration; M fixes the branching factor per node. Entry point selection and pruning strategies interact with M—fewer but higher-quality links can sometimes match recall of a higher M with less memory.

Frequently Asked Questions

Can M be different per layer?

Some implementations use M for the bottom layer and a smaller value (e.g. M0) for upper layers to keep the top graph sparser and reduce memory and hop cost on the express layers.

What if I set M too low?

Recall can drop: not enough links to bridge clusters or escape local minima during greedy search. Increase M or efSearch to compensate.

Does M affect query latency?

Yes. More links per node means more neighbors to expand at each step; so higher M can increase query time as well as memory.

How do I choose M for my dimension?

Higher dimensions often need larger M for the same recall. Start with 16–32 and plot recall vs. M and efSearch on a validation set; tune to meet your recall and latency targets.