Question

What is the contextual compression technique?

Accepted Answer

Contextual compression filters or rewrites retrieved chunks before sending them to the LLM, removing irrelevant content and keeping only the parts that actually answer the query. This reduces prompt size, lowers cost, and helps the LLM focus. Compression can be done with a smaller model, an extractor, or another LLM call before the main generation. ---