Question

What is BEIR and what does it measure?

Accepted Answer

BEIR, or Benchmarking Information Retrieval, is a heterogeneous benchmark with 18 datasets across domains like scientific papers, biomedical literature, news, and fact checking. It evaluates retrieval models in zero-shot settings, since models are not fine-tuned on each dataset, measuring NDCG at 10 as the primary metric. BEIR revealed that dense retrievers often underperform BM25 in out-of-domain settings, motivating hybrid search as a robust default. It is the standard benchmark for evaluating general-purpose retrieval models.