Dive into advanced strategies such as Chain-of-Thought and the ReAct pattern. These techniques enhance reasoning and decision-making capabilities of LLMs, allowing for more complex task handling.
10 audio · 3:14
Nortren·
What is chain-of-thought prompting?
0:21
Chain-of-thought, or CoT, prompting asks the model to show its reasoning step by step before giving a final answer. Instead of jumping directly to the answer, the model writes out intermediate steps. For complex tasks like math, logic, or multi-hop questions, CoT dramatically improves accuracy compared to direct answering. It was introduced by Wei and colleagues in 2022.
How much does chain-of-thought improve LLM accuracy?
0:21
On the GSM8K mathematical reasoning benchmark, the original CoT paper showed PaLM 540B going from 17.7 percent accuracy with standard few-shot to 58.1 percent with chain-of-thought. More recent models combined with CoT achieve over 90 percent on the same benchmark. The improvement varies by task and model but is consistently large for problems requiring multiple reasoning steps.
Zero-shot CoT is a simpler version of chain-of-thought that requires no examples. You just append "Let's think step by step" to the prompt, and the model generates reasoning steps automatically. Discovered by Kojima and colleagues in 2022, this trick works because instruction-tuned models learned to interpret the phrase as a request for explicit reasoning.
Use CoT for tasks requiring multi-step reasoning, math problems, logical deduction, complex planning, or any task where the model tends to make errors when answering directly. CoT is unnecessary for simple lookup tasks, classification with clear categories, or tasks where the answer follows trivially from the input. CoT increases token usage, so use it where the accuracy gain justifies the cost.
What is the difference between chain-of-thought and reasoning models?
0:21
Chain-of-thought is a prompting technique that any general-purpose LLM can use. Reasoning models like OpenAI o1, o3, and Claude with extended thinking are trained specifically to perform internal reasoning before responding, often using reinforcement learning on math and code. Reasoning models produce far better results on hard problems but cost more, respond more slowly, and do not need explicit CoT prompting.
Self-consistency is a technique where you sample multiple chain-of-thought completions from the model and pick the most common final answer by majority vote. It improves accuracy on tasks where there are many valid reasoning paths to the same answer. The cost is multiple forward passes, but the accuracy gain is often worth it for hard math and reasoning problems.
How does self-consistency improve over standard chain-of-thought?
0:18
Standard CoT relies on a single reasoning path that may have errors. Self-consistency exploits the fact that multiple correct reasoning paths usually converge to the same answer, while incorrect paths diverge. By sampling 10 to 40 chains and taking the majority answer, accuracy on benchmarks like GSM8K improves by 12 to 18 percent over single-path CoT.
Least-to-most prompting decomposes a complex problem into a sequence of simpler subproblems, solving them one at a time, with each solution feeding into the next. It is especially effective for tasks where the model struggles to handle the full problem at once but can solve each piece. It was introduced as an improvement over CoT for compositional reasoning.
Decomposition prompting is a general pattern where you ask the LLM to break a complex task into smaller subtasks before solving any of them. Once decomposed, each subtask can be tackled independently and combined into a final answer. It is particularly useful when the model would otherwise produce shallow or incomplete responses to complex questions.
What is the limitation of chain-of-thought reasoning?
0:19
CoT does not always produce correct answers, even when the reasoning looks plausible. Models can confidently produce wrong reasoning, hallucinate intermediate facts, or make arithmetic errors mid-chain. CoT also does not give the model new information; it cannot solve problems that depend on facts the model never learned. For knowledge-intensive tasks, combine CoT with retrieval.
---