Reasoning - Basic Theory

AI reasoning capabilities - chain of thought, thinking models, and logical inference.

Reasoning is one of the most important and rapidly evolving capabilities of modern AI. While early LLMs could generate fluent text, they often struggled with multi-step logic, math, and complex problem-solving. The introduction of chain-of-thought prompting and dedicated reasoning models has dramatically improved these capabilities.

Models like OpenAI o1/o3, DeepSeek-R1, and QwQ use "thinking tokens" — they reason step-by-step internally before producing a final answer. This mirrors the human distinction between fast intuitive thinking (System 1) and slow deliberate reasoning (System 2). Understanding these capabilities and their limits is crucial for knowing when to trust AI outputs.

Chain-of-Thought (CoT) Prompting

Asking models to "think step by step" dramatically improves accuracy on complex problems. Instead of jumping to answers, the model shows its work — breaking problems into manageable steps.

Prompting Techniques Prompt

Reasoning Models

OpenAI o1/o3, DeepSeek-R1, QwQ (Alibaba) are specifically trained for multi-step reasoning. They use extra inference-time compute to "think longer" before answering.

The Big Players

Thinking Tokens

Internal reasoning traces generated before the final answer. These tokens are the model's "scratch pad" — working through logic, checking steps, considering alternatives.

Token

Extended Thinking

Allocating more compute at inference time for harder problems. The model can "think longer" on complex questions, trading speed for accuracy. Test-time compute scaling.

Math Reasoning

Solving competition-level math (AIME, AMC), formal proofs, symbolic manipulation. Reasoning models have made dramatic progress here — approaching human expert level.

SOTA

Code Reasoning

Debugging complex codebases, analyzing architecture, implementing sophisticated algorithms. Code reasoning is one of the most practically valuable AI capabilities.

Vibecoding

Logical Inference

Syllogisms, deduction, constraint satisfaction, planning. Models can follow logical rules but still struggle with certain types of novel reasoning and common-sense physics.

System 1 vs System 2 Thinking

Kahneman's framework applied to AI: System 1 = fast intuitive responses (standard LLM), System 2 = slow deliberate reasoning (reasoning models with thinking tokens).

Current Limitations

Reasoning models still fail on truly novel problems, can produce convincing but wrong chains of reasoning, and may overthink simple questions. Verification remains essential.

Hallucinations

Reasoning Benchmarks

MATH (competition math), GSM8K (grade school), ARC-AGI (general reasoning), SWE-bench (real software engineering), Codeforces (competitive programming).

SOTA

Chain-of-ThoughtTechnique where models explain their reasoning step-by-step before giving an answer.

Thinking TokensInternal reasoning traces generated by reasoning models before the final output.

System 1/System 2Kahneman's framework: fast intuitive vs slow deliberate thinking, applied to AI.

Test-Time ComputeAllocating more processing at inference time to improve reasoning on harder problems.

For complex problems, explicitly ask the model to "think step by step" — this simple instruction activates chain-of-thought reasoning and can double accuracy on math and logic tasks
Reasoning models (o1, Claude with extended thinking) cost more and are slower — use them for hard problems, and faster models for simple tasks
If a model gives a wrong answer, try breaking the problem into smaller sub-questions rather than just asking again — decomposition often fixes reasoning failures