What is late chunking and how does it improve retrieval?
RAG & Vector DB Interview: Chunking Strategies, Overlap, Size, Semantic Splitting
Audio flashcard · 0:29Nortren·
What is late chunking and how does it improve retrieval?
0:29
Late chunking, introduced by Jina AI in 2024, embeds the entire document first using a long-context model, then derives chunk embeddings from the document-level token representations. This preserves cross-chunk context like pronouns and references to earlier sections that standard chunking destroys. The resulting chunk embeddings carry document-wide semantic context, which improves retrieval on queries that depend on information scattered across sections. It requires a long-context embedding model that supports this mode.
arxiv.org