Custom distance functions: Are they supported?
Support for custom distance functions varies by product. Many vector databases and ANN libraries support a fixed set of metrics (L2, cosine, dot product, sometimes L1 or Hamming) and do not allow user-defined functions; others expose a plugin or callback API for custom distances.
Summary
- Built-in metrics are optimized (e.g. SIMD, GPU) and assumed by index structure (e.g. HNSW’s pruning). Custom functions are often slower and may not satisfy metric axioms.
- Options: (1) transform vectors so L2 in new space = desired similarity (e.g. Mahalanobis after whitening); (2) use a system that supports your metric; (3) two-stage: ANN with built-in metric, then re-rank with custom function. L2, cosine, dot product cover most use cases.
- Custom callbacks rarely get SIMD/GPU optimization; use for re-ranking or small-scale brute-force, not as the primary ANN distance.
- Pipeline: prefer built-in L2/cosine/IP; if you need a custom notion of similarity, pre-transform data or re-rank with your function after ANN.
- Practical tip: pre-whitening + L2 (Mahalanobis) or re-ranking with a custom scorer preserves ANN speed while allowing application-specific similarity.
Why custom is limited
Built-in metrics are heavily optimized (e.g. SIMD, GPU kernels) and are assumed by the index structure (e.g. HNSW’s pruning relies on metric properties). A custom function implemented in Python or another high-level language is usually much slower and may not satisfy the metric axioms that the index expects, which can hurt recall or correctness.
So even when custom distances are supported, they often apply only to brute-force or small-scale search, not to the main ANN path. Practical tip: if your similarity is a monotonic transform of L2 or dot product (e.g. a scaled or shifted version), use the built-in metric and transform scores in application code instead of plugging a custom distance into the index.
Alternatives when you need a non-standard metric
If you need a non-standard metric, options include: (1) transform your vectors so that a standard metric (e.g. L2) in the new space corresponds to your desired similarity (e.g. Mahalanobis after whitening); (2) use a system that explicitly supports your metric (e.g. some research or in-house indexes); (3) do a two-stage pipeline—ANN with a built-in metric, then re-rank with your custom function.
For most applications, L2, cosine, and dot product are sufficient and give the best performance and ecosystem support. Trade-off: custom metrics offer flexibility but sacrifice speed and index compatibility; pre-transformation or re-ranking preserves ANN performance while allowing application-specific scoring. Mahalanobis distance describes the pre-whiten + L2 pattern.
Pipeline summary: default to a built-in metric (L2, cosine, IP) at collection creation. If your similarity is non-standard, either (1) pre-transform vectors so L2 in the new space matches your notion, (2) use a two-stage pipeline (ANN with L2/cosine, then re-rank with your function), or (3) use a system that documents support for your metric. Custom callbacks rarely get SIMD or GPU optimization; use them for re-ranking or small brute-force, not as the primary ANN distance. Mathematical properties of a metric space matter when the index assumes the triangle inequality for pruning.
Frequently Asked Questions
Can I use Mahalanobis in a vector DB?
Often only by pre-whitening vectors and using L2 in the transformed space. Few DBs support Mahalanobis natively; see Mahalanobis distance.
Why are custom distances slow?
No SIMD/GPU kernels; often Python or callback overhead. Built-in L2/dot product use optimized C/C++ and vectorized instructions.
What if my “distance” isn’t a metric?
ANN indexes that assume the triangle inequality may return wrong or suboptimal results. Use brute-force, or transform so a standard metric matches your notion of similarity.
Does Faiss support custom metrics?
Faiss offers a fixed set (L2, inner product). For custom metrics you typically pre-transform data or use a different library that allows pluggable distances.