MemotivaRAG & Vector DB Interview: Chunking Strategies, Overlap, Size, Semantic Splitting

What is late chunking and how does it improve retrieval?

RAG & Vector DB Interview: Chunking Strategies, Overlap, Size, Semantic Splitting

Audio flashcard · 0:29

Nortren·

What is late chunking and how does it improve retrieval?

0:29

Late chunking, introduced by Jina AI in 2024, embeds the entire document first using a long-context model, then derives chunk embeddings from the document-level token representations. This preserves cross-chunk context like pronouns and references to earlier sections that standard chunking destroys. The resulting chunk embeddings carry document-wide semantic context, which improves retrieval on queries that depend on information scattered across sections. It requires a long-context embedding model that supports this mode.
arxiv.org