LLM Engineer Interview Questions: Advanced RAG Techniques — Self-RAG, GraphRAG, Agentic RAG

LLM Engineer Interview Questions: Advanced RAG Techniques — Self-RAG, GraphRAG, Agentic RAG

Gain insights into retrieval-augmented generation (RAG) techniques, including pipeline design, chunking strategies, and advanced methods like Self-RAG and GraphRAG. Understanding these concepts is essential for optimizing model performance in complex scenarios.

13 audio · 4:11

Nortren·

What is the difference between naive RAG and advanced RAG?

0:21
Naive RAG embeds the query, retrieves top-K chunks, and feeds them to the LLM. Advanced RAG adds query rewriting, hybrid retrieval, reranking, contextual compression, multi-step retrieval, and grounding checks. Naive RAG is fine for simple use cases, but production-grade systems almost always use multiple advanced techniques to improve relevance and reduce hallucinations.

What is query rewriting and why is it useful?

0:20
Query rewriting uses an LLM to transform the user's original question into one or more reformulations better suited for retrieval. Examples include expanding abbreviations, generating synonyms, breaking compound questions into parts, or adding context from conversation history. Query rewriting fixes the common problem of users phrasing queries differently from how documents are written.

What is HyDE and how does it work?

0:18
HyDE stands for Hypothetical Document Embeddings. Instead of embedding the user query directly, you ask an LLM to write a fictional document that would answer the query, then embed that hypothetical document. The hypothetical document tends to use vocabulary closer to actual documents than the original question, improving retrieval quality without any fine-tuning.

What is Self-RAG?

0:20
Self-RAG is an approach where the model itself decides when to retrieve information, what to retrieve, and how much to trust the retrieved content. The model is trained to emit special tokens that signal retrieval and self-evaluation steps. This adaptive retrieval avoids unnecessary lookups for queries the model can answer from its own knowledge while improving accuracy on knowledge-intensive queries.

What is GraphRAG and when would you use it?

0:18
GraphRAG builds a knowledge graph from your documents during ingestion, identifying entities and their relationships. At query time it can traverse the graph to find connected information that pure vector search would miss. GraphRAG is especially useful for queries that require multi-hop reasoning across documents, like "what projects did the team that built X also work on?"

What is Agentic RAG?

0:19
Agentic RAG treats retrieval as a tool that an LLM agent can call, rather than a fixed pipeline step. The agent decides when to search, what to query, whether to refine the search, and when it has enough information to answer. This enables multi-step reasoning, follow-up queries, and combining retrieval with other tools like calculators or code execution.

What is contextual compression in RAG?

0:19
Contextual compression filters or rewrites retrieved chunks before sending them to the LLM, removing irrelevant content and keeping only the parts that actually answer the query. This reduces prompt size, lowers cost, and helps the LLM focus on what matters. Compression can be done with a smaller model, an LLM, or a trained extractor.

What is parent document retrieval?

0:18
Parent document retrieval is a technique where you index small chunks for precise retrieval but return their larger parent passages to the LLM. This combines the precision of small-chunk matching with the context of large-chunk reading. It addresses the tradeoff that small chunks retrieve well but lack context, while large chunks have context but match less precisely.

What is multi-vector retrieval?

0:18
Multi-vector retrieval indexes each document with multiple embeddings, each representing different aspects. For example, you might create separate embeddings for the document's summary, hypothetical questions it answers, and its raw text. Searching across all of them improves recall, especially when queries phrase things differently from how documents are written.

How do you handle structured data like tables and SQL in RAG?

0:19
For structured data, vector embeddings often work poorly because exact values matter. Better approaches include text-to-SQL where the LLM writes a SQL query to your database, table extraction with specialized retrievers, and serializing rows as natural language statements before embedding. Hybrid systems often route queries to structured or unstructured retrieval based on query type.

What are common failure modes of RAG systems?

0:20
Common failures include retrieval misses where relevant documents are not found, retrieval noise where irrelevant chunks dilute the prompt, lost in the middle where the LLM ignores middle context, hallucinations despite retrieval, stale data when ingestion lags, and over-reliance on retrieval for queries the model could answer directly. Each failure has its own debugging approach.

How do you debug a RAG system that gives wrong answers?

0:21
Debug RAG by isolating each stage. First check whether the relevant documents exist in the index. Then verify that retrieval surfaces them by inspecting top-K results manually. If retrieval is fine, check whether the chunks contain enough context to answer. If they do, examine the prompt to see if the LLM is being instructed clearly. Most RAG bugs are retrieval bugs, not generation bugs.

How do you evaluate the quality of a RAG system?

0:20
Evaluate RAG with metrics covering retrieval and generation. For retrieval, use recall at K, precision at K, and mean reciprocal rank against a labeled dataset. For generation, measure faithfulness which checks grounding in retrieved context, answer relevance to the question, and context precision and recall. Frameworks like Ragas and TruLens automate this evaluation. ---