What is ColBERT and how does late interaction work?
RAG & Vector DB Interview: Hybrid Search, BM25, Rerankers, ColBERT, RRF Explained
Audio flashcard · 0:29Nortren·
What is ColBERT and how does late interaction work?
0:29
ColBERT is a retrieval model by Khattab and Zaharia from 2020 that stores one embedding per token rather than one per document, enabling late interaction matching. At query time, every query token is compared to every document token, and the maximum similarity per query token is summed to score the document. This preserves fine-grained token-level matching while remaining scalable because document embeddings are precomputed. ColBERT models often match cross-encoder quality at bi-encoder speed, making late interaction attractive for large-scale retrieval.
arxiv.org