How do you handle documents larger than the context window?
RAG & Vector DB Interview: Production RAG, Latency, Caching, Cost, Monitoring
Audio flashcard · 0:30Nortren·
How do you handle documents larger than the context window?
0:30
Chunk them into manageable pieces at ingest, typically 256 to 1024 tokens per chunk with overlap, and retrieve only the most relevant chunks per query. When the top chunks do not fit in the context window, prioritize by relevance score or use parent-child retrieval to return small chunks with context links. For queries that need full-document understanding, summarize each document offline and index summaries for initial retrieval, then fetch full documents only for the most relevant matches. This two-stage approach handles arbitrarily large corpora.
docs.llamaindex.ai