How do you cache in a RAG system?
RAG & Vector DB Interview: Production RAG, Latency, Caching, Cost, Monitoring
Audio flashcard · 0:31Nortren·
How do you cache in a RAG system?
0:31
Cache at multiple layers: embedding results for repeated queries or documents, retrieval results for identical or near-identical queries, reranker outputs by query-candidate pairs, and generated answers for fully repeated requests. Use a key based on the normalized query plus the relevant configuration state. Exact-match caching covers power users with repeated questions, while semantic caching, where embedding similarity decides cache hits, covers paraphrased queries. Invalidate caches when the underlying index or prompts change to avoid serving stale answers.
python.langchain.com