Question

What are the trade-offs between recall, latency, and memory in ANN indexes?

Accepted Answer

The three metrics form a triangle you cannot optimize all at once. Higher recall requires more candidates to consider, which costs latency, or denser graph connections, which cost memory. Lower latency requires aggressive pruning or quantization, which cost recall. Lower memory requires compression like product quantization, which also costs recall. Production systems typically target 95 to 99 percent recall under 50 millisecond latency, then tune memory to fit hardware. The ann-benchmarks project makes these trade-offs explicit across algorithms.