Question

What is chunking in RAG and why is it necessary?

Accepted Answer

Chunking is the process of splitting documents into smaller pieces before embedding them for retrieval. It is necessary because embedding models have token limits, usually 512 to 8192 tokens, because retrieval precision drops when a single embedding must represent too much content, and because language model context windows cannot fit entire documents. Well-sized chunks let the retriever return focused passages directly relevant to the query rather than entire documents where the answer is buried among unrelated text.