Indexing Algorithms - HNSW · Topic 66

Defining “efSearch” and its impact on query speed

efSearch is the size of the dynamic candidate list used during HNSW query search. At each layer, the algorithm keeps the best efSearch candidates and expands from them. Larger efSearch means more candidates are explored, so recall generally improves, but more distance computations and more heap operations mean higher latency and lower throughput.

Summary

efSearch caps the candidate list during query-time greedy search; it is the main query-time lever for recall vs. latency and does not affect index build.
Does not affect build—only query time. Can often be changed per query or per tenant. Rule of thumb: efSearch ≥ k; often 2–10× k for stable recall.
Main lever for recall–latency together with M and graph quality from efConstruction.
Pipeline: start from entry point, expand neighbors into a priority queue of size efSearch, follow best candidates layer by layer until the bottom layer returns top-k from the candidate set.
Trade-off: higher efSearch improves recall and consistency but increases latency and temporary memory per query; tune to the smallest value that meets your recall target.

What efSearch does at query time

During an HNSW query, the search maintains a dynamic list of the best candidates seen so far—typically a priority queue ordered by distance to the query. The maximum size of this list is efSearch. At each step, the algorithm expands neighbors of the current best candidates and adds them to the queue, then keeps only the top efSearch by distance. When the search moves to a lower layer or terminates, the top-k results are taken from this candidate set. So efSearch directly controls how much of the graph is explored: larger values mean more nodes are considered and recall tends to go up, at the cost of more distance computations and heap operations.

efSearch does not change the index structure; it only affects the query algorithm. So you can build the index once and vary efSearch per request or per deployment to balance recall and latency.

Query-time only

Unlike efConstruction, efSearch does not affect index build—only query time. So you can build once with high efConstruction for a good graph, then at query time choose efSearch based on whether you need maximum recall (e.g. 500–1000) or low latency (e.g. 64–128). Many systems allow changing efSearch per query or per tenant without rebuilding the index.

This makes efSearch ideal for multi-tenant or mixed workload scenarios: strict latency SLAs can use a lower efSearch; batch or offline jobs can use a higher value for better recall.

Pipeline: search and candidate list

The query pipeline starts at an entry point (often on the top layer), then at each layer the algorithm greedily moves to the best neighbor and expands the candidate set. The candidate list is capped at efSearch. When no better neighbor is found on the current layer, the search steps down to the next layer and continues. On the bottom layer, expansion continues until the search terminates (e.g. when the best candidate cannot be improved); the top-k are then returned from the efSearch-sized candidate set. So the size of that set controls how many nodes are ever considered and thus recall and latency.

Choosing efSearch

The rule of thumb is efSearch ≥ k (the number of neighbors requested); often efSearch is set to a multiple of k (e.g. 2–10×) to get stable recall. If efSearch is smaller than k, you cannot return k neighbors. In practice, 2×k to 10×k is common; very high recall requirements may need efSearch in the hundreds or more.

The recall–latency trade-off in HNSW is largely controlled by this parameter, along with M and the quality of the graph built with efConstruction. A well-built graph (high efConstruction) often needs a lower efSearch for the same recall, because the links are better and the search finds good candidates sooner.

Trade-offs and practical tips

Trade-off summary: efSearch trades recall for latency and temporary memory. Higher efSearch improves recall and reduces variance across queries but increases per-query cost. Practical tip: plot recall@k and latency (e.g. p50, p99) for a range of efSearch values on your data; pick the smallest efSearch that meets your recall target so that latency and throughput stay within budget. For real-time APIs, start with a conservative efSearch (e.g. 64–128) and increase only if recall is insufficient.

Frequently Asked Questions

Can I use a different efSearch per request?

Yes. Many vector DBs accept efSearch in the query API so you can trade recall vs. latency per request (e.g. strict latency for real-time, higher efSearch for batch).

What if efSearch is smaller than k?

You can’t return k neighbors from a candidate list smaller than k. Set efSearch ≥ k; in practice often 2×k or more for good recall.

Does efSearch affect memory at query time?

Yes. The candidate list (priority queue) holds up to efSearch entries; larger efSearch means more temporary memory per query.

How do I tune efSearch?

Plot recall@k and latency for a range of efSearch values on your data; pick the smallest efSearch that meets your recall target. See recall–latency trade-off curve.