Question

What is HyDE and how does it improve RAG retrieval?

Accepted Answer

HyDE, or Hypothetical Document Embeddings, is a technique where a language model first generates a hypothetical answer to the query, then that generated text is embedded and used as the retrieval query instead of the original question. This works because hypothetical answers tend to be semantically closer to real answer passages in embedding space than short questions are, improving retrieval on queries that differ stylistically from documents. HyDE was introduced by Gao and colleagues in 2022 and adds one extra language model call per query.