Question

Why does my RAG system retrieve irrelevant documents?

Accepted Answer

The usual causes are poorly tuned chunking that creates embeddings too broad or too narrow, an embedding model that does not match the domain vocabulary, missing metadata filters so queries mix unrelated content, queries phrased very differently from documents causing vocabulary mismatch, or a single-retrieval pipeline with no reranker to fix first-stage errors. Diagnose by inspecting failed queries individually: which chunk should have been retrieved, where did it rank, and what scored higher. Patterns in the failures point to the fix.