LLM Engineer Interview Questions: Prompt Engineering, Few-Shot, Chain-of-Thought, Structured Outputs

Question 1

What is prompt engineering?

Accepted Answer

Prompt engineering is the practice of designing inputs to a language model to produce the desired outputs without changing model weights. It includes choosing instructions, examples, formatting, and structure. Good prompt engineering can dramatically improve model performance on a task without any training, making it the cheapest and fastest optimization lever.

Question 2

What is zero-shot prompting?

Accepted Answer

Zero-shot prompting asks the model to perform a task with only an instruction and no examples. Modern instruction-tuned LLMs handle many tasks zero-shot well. It is the simplest prompting style and works best when the task is common and clearly described.

Question 3

What is few-shot prompting?

Accepted Answer

Few-shot prompting includes a small number of input-output examples in the prompt before the actual query. The examples teach the model the desired format and reasoning style. Few-shot is especially useful for tasks with specific output formats, ambiguous instructions, or domain-specific patterns the model would not infer from the instruction alone.

Question 4

What is chain-of-thought prompting?

Accepted Answer

Chain-of-thought, or CoT, prompting asks the model to show its reasoning step by step before giving a final answer. For complex tasks like math or logic, CoT dramatically improves accuracy compared to direct answering. The simplest form just adds "Let's think step by step" to the prompt; more advanced versions provide examples of step-by-step reasoning.

Question 5

What is the difference between chain-of-thought and reasoning models?

Accepted Answer

Chain-of-thought is a prompting technique applied to general models. Reasoning models like OpenAI o1, o3, and Claude with extended thinking are trained specifically to perform internal reasoning before responding, often using reinforcement learning on math and code. Reasoning models produce far better results on complex problems but cost more and respond more slowly.

Question 6

What is self-consistency in prompting?

Accepted Answer

Self-consistency is a technique where you sample multiple chain-of-thought completions from the model and pick the most common final answer by majority vote. It improves accuracy on tasks where there are many valid reasoning paths to the same answer. The cost is multiple forward passes, but the accuracy gain is often worth it for hard problems.

Question 7

What is ReAct prompting?

Accepted Answer

ReAct stands for Reasoning and Acting. It is a prompting pattern where the model alternates between thinking and using tools, producing traces like "Thought - Action - Observation - Thought - Action - Final Answer." ReAct enables LLM agents to use external tools like search, calculators, and APIs while showing their reasoning, making behavior more interpretable and debuggable.

Question 8

What is structured output and why does it matter?

Accepted Answer

Structured output means constraining the model to produce responses in a specific format like JSON, XML, or a function call signature. It matters because production systems usually need parseable outputs, not free-form text. Modern LLM APIs from OpenAI, Anthropic, and Google all support structured output through schemas, eliminating the need for fragile prompt-based JSON extraction.

Question 9

How does function calling work in modern LLMs?

Accepted Answer

Function calling lets you describe a set of available tools to the model with their names, descriptions, and parameter schemas. The model chooses which function to call and produces arguments that match the schema. The application then executes the function and returns the result. This is the foundation of LLM agents and tool use.

Question 10

What is the difference between system, user, and assistant messages?

Accepted Answer

System messages set the model's role, behavior, and constraints, processed before any user input. User messages are inputs from the user. Assistant messages are previous responses from the model. The model is trained on this multi-role format, and structuring conversations correctly significantly improves output quality compared to mixing everything into one prompt.

Question 11

What is prompt injection and how do you defend against it?

Accepted Answer

Prompt injection is an attack where untrusted input contains instructions that trick the model into ignoring its system prompt or revealing sensitive information. Defenses include separating system from user input clearly, validating outputs, using structured output, running content filters on responses, and sandboxing tool execution. There is no perfect defense; defense in depth is essential.

---