Performance, Evaluation & Benchmarking · Topic 172

Cost per query (CPQ)

Cost per query (CPQ) is the total cost to serve a single vector search request, including compute, memory, I/O, and (in the cloud) instance and network costs. It is key for capacity planning and comparing managed vs. self-hosted VDBs.

Summary

Cost per query (CPQ): total cost to serve a single vector search request (compute, memory, I/O, instance, network). Key for capacity planning and comparing managed vs. self-hosted VDBs.
CPQ = infrastructure cost over a period / number of queries. Factors: latency, throughput (higher QPS spreads fixed costs), index type, filtering or hybrid search. Spot instances can lower CPQ for batch workloads. Optimize efSearch, nprobe, index choice. See spot instances. Pipeline: sum cost over period, divide by query count. Practical tip: measure at steady-state QPS; compare managed vs. self-hosted with same recall/latency.

Calculating CPQ

CPQ is derived from infrastructure cost over a period divided by the number of queries in that period. Factors: latency (longer queries tie up resources longer), throughput (higher QPS spreads fixed costs), index type (e.g. HNSW vs. IVF), and whether queries use filtering or hybrid search, which add CPU. Spot or preemptible instances can lower CPQ for batch or delay-tolerant workloads.

Pipeline: sum cost over period, divide by query count. Practical tip: measure at steady-state QPS; compare managed vs. self-hosted with same recall and latency targets.

Optimizing CPQ

Managed services often price by query or by node-hour; understanding your QPS and latency profile lets you map that to an effective CPQ. Optimizing efSearch, nprobe, and index choice directly affects both latency and CPU use, and thus CPQ.

Frequently Asked Questions

What is cost per query (CPQ)?

Total cost to serve a single vector search request—compute, memory, I/O, and (in the cloud) instance and network costs. Key for capacity planning and comparing managed vs. self-hosted VDBs. Derived from infrastructure cost over a period divided by number of queries. See QPS and latency.

What factors affect CPQ?

Latency (longer queries tie up resources), throughput (higher QPS spreads fixed costs), index type (HNSW vs. IVF), filtering or hybrid search (add CPU). Spot instances can lower CPQ for batch or delay-tolerant workloads.

How do I optimize CPQ?

Optimize efSearch, nprobe, and index choice—they affect both latency and CPU use. Managed services often price by query or node-hour; map your QPS and latency profile to effective CPQ.

Does filtering or hybrid search increase CPQ?

Yes, they add CPU. Metadata filtering and hybrid search increase per-query work. Balance with QPS and recall-latency requirements.