Prompt Engineering Patterns: Generated Knowledge, RAG Prompts, Citation and Grounding Techniques

Prompt Engineering Patterns: Generated Knowledge, RAG Prompts, Citation and Grounding Techniques

Questions and materials on "Prompt Engineering Patterns: Generated Knowledge, RAG Prompts, Citation and Grounding Techniques"

8 audio · 2:29

Nortren·

What is the generated knowledge prompting pattern?

0:19
Generated knowledge prompting first asks the model to generate background facts about a topic, then uses those generated facts as context for the actual question. The model often performs better when it has explicitly stated relevant knowledge before reasoning, even when the knowledge comes from the same model that will use it. It is a self-prompting technique introduced in 2021.

When should you use generated knowledge prompting?

0:20
Use generated knowledge for tasks that benefit from explicit context the model already knows but does not always retrieve when prompted directly. Common use cases include commonsense reasoning, domain knowledge questions, and fact verification. It works less well for tasks needing fresh or specialized information the model never learned, where retrieval-augmented generation is better.

What is a RAG prompt and how does it differ from a normal prompt?

0:19
A RAG prompt is a prompt that includes retrieved documents as context, with instructions to answer based on them. It typically has three sections: retrieved chunks marked as context, the user's question, and instructions to ground the answer in the context and cite sources. RAG prompts are how retrieval-augmented generation systems instruct the model to use external knowledge.

How do you instruct a model to cite sources?

0:19
To get citations, include the document IDs or names with each retrieved chunk, then instruct the model to reference them when stating facts. Specify the citation format in the instructions, like "Cite sources as [doc-id]." For higher reliability, ask the model to only state claims that are directly supported by a citation, and to abstain otherwise.

How do you reduce hallucinations in RAG prompts?

0:18
Reduce hallucinations by explicitly instructing the model to answer only from the provided context, to say "I don't know" when context is insufficient, to cite the source for each claim, and to avoid making up facts not present in the documents. Structure the prompt to keep the most important context near the question, since LLMs attend less to middle context.

What is the lost in the middle problem in long-context prompts?

0:17
Lost in the middle is the observation that LLMs often pay less attention to information placed in the middle of a long context compared to the beginning or end. This means even models with large context windows do not use them uniformly. Place the most important information at the start or end of the prompt, especially for RAG with many retrieved chunks.

How do you order retrieved chunks in a RAG prompt?

0:18
Order retrieved chunks by relevance, with the most relevant either at the start or end of the context window, where the model attends most. Some systems duplicate the top result at both ends. Avoid placing critical information in the middle. For complex queries, consider reranking before placement or showing only the top three chunks instead of ten.

What is the contextual compression technique?

0:19
Contextual compression filters or rewrites retrieved chunks before sending them to the LLM, removing irrelevant content and keeping only the parts that actually answer the query. This reduces prompt size, lowers cost, and helps the LLM focus. Compression can be done with a smaller model, an extractor, or another LLM call before the main generation. ---