LLM Engineer Interview Questions: Prompt Engineering, Few-Shot, Chain-of-Thought, Structured Outputs

LLM Engineer Interview Questions: Prompt Engineering, Few-Shot, Chain-of-Thought, Structured Outputs

Delve into advanced prompt engineering techniques including few-shot learning, chain-of-thought reasoning, and structured outputs. Mastering these skills is essential for developing effective interaction strategies with LLMs.

11 audio · 3:24

Nortren·

What is prompt engineering?

0:19
Prompt engineering is the practice of designing inputs to a language model to produce the desired outputs without changing model weights. It includes choosing instructions, examples, formatting, and structure. Good prompt engineering can dramatically improve model performance on a task without any training, making it the cheapest and fastest optimization lever.

What is zero-shot prompting?

0:14
Zero-shot prompting asks the model to perform a task with only an instruction and no examples. Modern instruction-tuned LLMs handle many tasks zero-shot well. It is the simplest prompting style and works best when the task is common and clearly described.

What is few-shot prompting?

0:18
Few-shot prompting includes a small number of input-output examples in the prompt before the actual query. The examples teach the model the desired format and reasoning style. Few-shot is especially useful for tasks with specific output formats, ambiguous instructions, or domain-specific patterns the model would not infer from the instruction alone.

What is chain-of-thought prompting?

0:19
Chain-of-thought, or CoT, prompting asks the model to show its reasoning step by step before giving a final answer. For complex tasks like math or logic, CoT dramatically improves accuracy compared to direct answering. The simplest form just adds "Let's think step by step" to the prompt; more advanced versions provide examples of step-by-step reasoning.

What is the difference between chain-of-thought and reasoning models?

0:19
Chain-of-thought is a prompting technique applied to general models. Reasoning models like OpenAI o1, o3, and Claude with extended thinking are trained specifically to perform internal reasoning before responding, often using reinforcement learning on math and code. Reasoning models produce far better results on complex problems but cost more and respond more slowly.

What is self-consistency in prompting?

0:17
Self-consistency is a technique where you sample multiple chain-of-thought completions from the model and pick the most common final answer by majority vote. It improves accuracy on tasks where there are many valid reasoning paths to the same answer. The cost is multiple forward passes, but the accuracy gain is often worth it for hard problems.

What is ReAct prompting?

0:21
ReAct stands for Reasoning and Acting. It is a prompting pattern where the model alternates between thinking and using tools, producing traces like "Thought - Action - Observation - Thought - Action - Final Answer." ReAct enables LLM agents to use external tools like search, calculators, and APIs while showing their reasoning, making behavior more interpretable and debuggable.

What is structured output and why does it matter?

0:20
Structured output means constraining the model to produce responses in a specific format like JSON, XML, or a function call signature. It matters because production systems usually need parseable outputs, not free-form text. Modern LLM APIs from OpenAI, Anthropic, and Google all support structured output through schemas, eliminating the need for fragile prompt-based JSON extraction.

How does function calling work in modern LLMs?

0:18
Function calling lets you describe a set of available tools to the model with their names, descriptions, and parameter schemas. The model chooses which function to call and produces arguments that match the schema. The application then executes the function and returns the result. This is the foundation of LLM agents and tool use.

What is the difference between system, user, and assistant messages?

0:19
System messages set the model's role, behavior, and constraints, processed before any user input. User messages are inputs from the user. Assistant messages are previous responses from the model. The model is trained on this multi-role format, and structuring conversations correctly significantly improves output quality compared to mixing everything into one prompt.

What is prompt injection and how do you defend against it?

0:20
Prompt injection is an attack where untrusted input contains instructions that trick the model into ignoring its system prompt or revealing sensitive information. Defenses include separating system from user input clearly, validating outputs, using structured output, running content filters on responses, and sandboxing tool execution. There is no perfect defense; defense in depth is essential. ---