MemotivaRAG & Vector DB Interview: Chunking Strategies, Overlap, Size, Semantic Splitting

What is chunk overlap and why is it used?

RAG & Vector DB Interview: Chunking Strategies, Overlap, Size, Semantic Splitting

Audio flashcard · 0:28

Nortren·

What is chunk overlap and why is it used?

0:28

Chunk overlap is the number of tokens or characters repeated between adjacent chunks, typically 10 to 20 percent of chunk size. It prevents losing information when a sentence or concept straddles a chunk boundary, which would otherwise split context across two chunks and hurt retrieval for queries about that exact topic. Overlap adds storage cost and creates near-duplicate results during retrieval, which reranking or deduplication can handle. A common default is 50 to 100 tokens overlap on a 512-token chunk.
python.langchain.com