MemotivaRAG & Vector DB Interview: Hybrid Search, BM25, Rerankers, ColBERT, RRF Explained

What is BM25 and why is it still used in modern RAG systems?

RAG & Vector DB Interview: Hybrid Search, BM25, Rerankers, ColBERT, RRF Explained

Audio flashcard · 0:33

Nortren·

What is BM25 and why is it still used in modern RAG systems?

0:33

BM25, or Best Matching 25, is a probabilistic ranking function from the 1990s that scores documents by term frequency, inverse document frequency, and document length normalization. Despite its age, BM25 remains competitive because it handles exact keyword matches, rare terms, product names, and identifiers that dense embeddings often smooth over. Modern RAG systems combine BM25 with dense retrieval in hybrid search, letting each method cover the other's weaknesses. It is the default scoring method in Elasticsearch, OpenSearch, and most hybrid search implementations.
en.wikipedia.org