Question

What is contextual compression in RAG?

Accepted Answer

Contextual compression filters or rewrites retrieved chunks before sending them to the LLM, removing irrelevant content and keeping only the parts that actually answer the query. This reduces prompt size, lowers cost, and helps the LLM focus on what matters. Compression can be done with a smaller model, an LLM, or a trained extractor.