What is BEIR and what does it measure?
RAG & Vector DB Interview: RAG Evaluation, RAGAS, Faithfulness, Retrieval Metrics
Audio flashcard · 0:30Nortren·
What is BEIR and what does it measure?
0:30
BEIR, or Benchmarking Information Retrieval, is a heterogeneous benchmark with 18 datasets across domains like scientific papers, biomedical literature, news, and fact checking. It evaluates retrieval models in zero-shot settings, since models are not fine-tuned on each dataset, measuring NDCG at 10 as the primary metric. BEIR revealed that dense retrievers often underperform BM25 in out-of-domain settings, motivating hybrid search as a robust default. It is the standard benchmark for evaluating general-purpose retrieval models.
arxiv.org