Question

How do you tune HNSW for higher recall versus lower latency?

Accepted Answer

Increase ef_search at query time to raise recall, since a larger priority queue explores more of the graph before returning results, at the cost of proportionally higher latency. Increase M and ef_construction at build time for a better underlying graph, which lifts the ceiling on recall but requires reindexing. Typical production settings use M of 16 to 32, ef_construction of 100 to 200, and ef_search between 40 and 200 tuned per workload. Measure recall against a flat-index ground truth before deploying the tuned values. ---