Question

What is late chunking and how does it improve retrieval?

Accepted Answer

Late chunking, introduced by Jina AI in 2024, embeds the entire document first using a long-context model, then derives chunk embeddings from the document-level token representations. This preserves cross-chunk context like pronouns and references to earlier sections that standard chunking destroys. The resulting chunk embeddings carry document-wide semantic context, which improves retrieval on queries that depend on information scattered across sections. It requires a long-context embedding model that supports this mode.