← All topics

Indexing Algorithms - IVF & Quantization · Topic 85

What is nprobe (number of clusters to search)?

nprobe is the number of Voronoi cells (clusters) that an IVF index searches at query time. For each query, the system finds the nprobe nearest centroids to the query vector and then performs nearest neighbor search only among the vectors in those clusters. Increasing nprobe improves recall but increases the number of vectors scanned and thus query latency.

Summary

  • Query-time parameter (no rebuild). Range 1 to nlist; often a few to a few hundred. More probed clusters → more vectors scored → higher latency and usually higher recall.
  • Main lever with nlist for the recall–latency curve for IVF.
  • Pipeline: query vector → find nprobe nearest centroids → search only vectors in those clusters (flat or with PQ); nprobe can be set per request.
  • Trade-off: higher nprobe improves recall and latency; lower nprobe reduces work and latency but may miss neighbors in unprobed cells.
  • Practical tip: sweep nprobe (e.g. 1, 4, 16, 64) and plot recall@k vs. latency; choose smallest nprobe that meets recall target. Combining IVF and PQ (IVFPQ) keeps nprobe as the main query-time lever.

Using nprobe

nprobe is a query-time parameter: you can change it per query or set a default without rebuilding the index (unlike nlist, which is fixed at build time). Typical values range from 1 (fastest, lowest recall) up to nlist (search all clusters—equivalent to brute force over the IVF assignment). Often nprobe is set between a few and a few hundred; the exact value depends on your recall@K and latency requirements and on how many clusters you have (nlist).

Trade-off

The trade-off is direct: more probed clusters mean more vectors to score, so higher latency and often higher recall; fewer probed clusters mean less work and lower recall. Tuning nprobe (and nlist) is the main way to move along the recall–latency curve for IVF-based indexes.

Practical tip: many vector DBs allow nprobe per query, so you can use a low nprobe for strict-latency real-time requests and a higher nprobe for batch or offline evaluation. The recall–latency trade-off curve and measuring recall@k and measuring latency (p50, p99) guide tuning.

Frequently Asked Questions

Can nprobe be larger than nlist?

No. You can’t probe more clusters than exist. Maximum nprobe = nlist (search all clusters).

How do I pick nprobe for my recall target?

Sweep nprobe (e.g. 1, 4, 16, 64, …) and measure recall@K and latency on a validation set; choose the smallest nprobe that meets recall.

Does nprobe affect build?

No. Build only computes centroids and assigns vectors to clusters. nprobe is used only at query time.

Can different queries use different nprobe?

Yes. Many APIs accept nprobe per request so you can trade recall vs. latency per query (e.g. strict latency for real-time, higher nprobe for batch).