MemotivaRAG & Vector DB Interview: RAG Evaluation, RAGAS, Faithfulness, Retrieval Metrics

What is BEIR and what does it measure?

RAG & Vector DB Interview: RAG Evaluation, RAGAS, Faithfulness, Retrieval Metrics

Audio flashcard · 0:30

Nortren·

What is BEIR and what does it measure?

0:30

BEIR, or Benchmarking Information Retrieval, is a heterogeneous benchmark with 18 datasets across domains like scientific papers, biomedical literature, news, and fact checking. It evaluates retrieval models in zero-shot settings, since models are not fine-tuned on each dataset, measuring NDCG at 10 as the primary metric. BEIR revealed that dense retrievers often underperform BM25 in out-of-domain settings, motivating hybrid search as a robust default. It is the standard benchmark for evaluating general-purpose retrieval models.
arxiv.org