Semi-formal problem decomposition before acting — the agent sketches a reasoning structure (subgoals, assumptions, constraints) first, then executes. Reduces hallucinations and improves multi-step planning quality.
An agent that immediately acts on a complex task often takes wrong turns, makes false assumptions, and produces confident but incorrect outputs. The fix is simple and powerful: reason before acting.
Agentic reasoning is the practice of having the agent explicitly structure its understanding of a problem — identify subgoals, note assumptions, list constraints, and outline an approach — before taking any action. This structured pre-planning forces the model to confront ambiguities before they become mistakes, and produces a readable "reasoning trace" that makes the agent's behaviour auditable.
Think of it as the difference between a developer who immediately starts coding versus one who first writes a brief design document. The second developer's code is typically more correct, more maintainable, and takes fewer dead ends.
Several structured frameworks have proven effective for agentic reasoning:
OODA loop (Observe-Orient-Decide-Act): from military strategy. Observe what's happening → Orient (make sense of it in context) → Decide (pick an action) → Act. Cycle rapidly.
Subgoal decomposition: break the top-level goal into a tree of subgoals. Each subgoal is independently achievable and contributes to the parent goal. Execute leaf subgoals first, building toward the root.
Assumption mapping: before acting, list explicit assumptions ("I'm assuming the user wants Python code", "I'm assuming the database is PostgreSQL"). If an assumption is wrong, the plan is wrong. Listing them makes them reviewable.
Constraint identification: list constraints upfront ("must complete in under 10 API calls", "output must be under 500 words", "cannot use deprecated functions"). Constraints filter the action space before the first tool call.
import anthropic
client = anthropic.Anthropic()
REASONING_PROMPT = '''Before taking any action, produce a structured analysis:
GOAL: [Restate the task in your own words]
SUBGOALS: [List 2-5 specific subgoals needed to achieve the main goal]
ASSUMPTIONS: [List key assumptions you're making]
CONSTRAINTS: [List any constraints or limitations]
APPROACH: [Briefly outline your step-by-step approach]
RISKS: [Identify potential failure modes]
Then proceed with the task.'''
def reasoning_agent(task: str, tools: list[dict]) -> str:
# Phase 1: Structured reasoning (no tool calls)
reasoning_response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
system=REASONING_PROMPT,
messages=[{"role": "user", "content": task}]
)
reasoning = reasoning_response.content[0].text
print(f"Reasoning:
{reasoning}
{'='*50}")
# Phase 2: Execute with reasoning as context
execution_response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=2048,
system="You are a helpful assistant. Execute the plan you've outlined.",
tools=tools,
messages=[
{"role": "user", "content": task},
{"role": "assistant", "content": f"My analysis:
{reasoning}
Now I'll execute this plan:"},
]
)
return execution_response.content[0].text
# The agent reasons first, then acts
result = reasoning_agent(
task="Audit our Python codebase's security: check for SQL injection, hardcoded secrets, and insecure dependencies.",
tools=[
{"name": "run_grep", "description": "Run regex search on codebase.", "input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}, "required": ["pattern"]}},
]
)
Chain-of-thought (CoT) prompting ("think step by step") is a single-response technique: the model reasons within one completion before giving its final answer. It's fast and cheap but limited to what the model knows at that moment.
Agentic reasoning is multi-step and multi-turn: the model reasons, acts, observes results, re-reasons in light of new information, and acts again. It can revise its plan mid-execution when the environment pushes back ("the API returned an error — I need to try a different approach").
Chain-of-thought:
User → [Reason] → Answer (single completion)
Agentic reasoning:
User → [Reason] → [Act] → [Observe] → [Re-reason] → [Act] → ... → Answer
↑ ↑
upfront plan adaptive re-planning
Use CoT when: the task can be solved from the model's training knowledge, no tools are needed, and you want a fast, single-turn response. Use agentic reasoning when: the task requires information gathering, multi-step execution, or the right approach depends on intermediate results.
A scratchpad is a designated space for the agent to "think out loud" — intermediate reasoning that isn't part of the final output. This separates the visible answer from the working-out:
SCRATCHPAD_SYSTEM = '''You have a private scratchpad for thinking. Use it freely.
Format your response as:
[Your reasoning, calculations, drafts here — not shown to user]
[Your final answer here]
'''
def scratchpad_agent(task: str) -> tuple[str, str]:
response = client.messages.create(
model="claude-sonnet-4-5", max_tokens=2048,
system=SCRATCHPAD_SYSTEM,
messages=[{"role": "user", "content": task}]
)
text = response.content[0].text
# Parse scratchpad and answer
import re
scratchpad = re.search(r'(.*?) ', text, re.DOTALL)
answer = re.search(r'(.*?) ', text, re.DOTALL)
return (
scratchpad.group(1).strip() if scratchpad else "",
answer.group(1).strip() if answer else text
)
thinking, final_answer = scratchpad_agent(
"A train leaves at 9am at 120km/h. Another leaves the same station at 10am at 150km/h. When does the second overtake the first?"
)
print(f"Thinking: {thinking}")
print(f"Answer: {final_answer}")
Claude's extended thinking feature provides a native scratchpad: the model can think for longer before producing a final response, with thinking tokens that aren't charged at the same rate as output tokens:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000 # Allow up to 10K thinking tokens
},
messages=[{"role": "user", "content": "Design the architecture for a real-time agent monitoring system that tracks cost, latency, and error rates across 1000 concurrent agent instances."}]
)
# Response contains thinking blocks and text blocks
for block in response.content:
if block.type == "thinking":
print(f"[THINKING] {block.thinking[:200]}...") # Internal reasoning
elif block.type == "text":
print(f"[ANSWER] {block.text}")
Extended thinking is most valuable for: complex planning tasks, multi-constraint problems, code design, mathematical reasoning, and any task where first-draft quality is substantially below the model's capability. The thinking tokens are not billed at the same rate as output tokens.
Reasoning tokens cost money too. A 2,000-token reasoning preamble before every action adds significant cost to long agent runs. Be deliberate about when you invoke structured reasoning — use it for complex planning steps, not trivial actions ("call the search tool with this query" doesn't need a 500-token reasoning preamble).
Explicit reasoning can be over-confident. "I assume X" in a plan can lock the model into a wrong assumption even when evidence to the contrary emerges. Build in explicit assumption-checking steps: after gathering initial information, revisit the assumption list and update the plan if needed.
Users don't want to see reasoning traces. Internal reasoning is for you and the system, not the user. Always strip scratchpad content before returning to the user. Showing raw reasoning makes responses feel unpolished and can reveal sensitive intermediate steps.
| Framework | Structure | Best For | Limitation |
|---|---|---|---|
| Chain-of-Thought | Linear reasoning steps | Math, logical deduction | No backtracking; single path |
| Tree-of-Thought | Branching + pruning | Open-ended exploration, planning | High cost; complex to implement |
| ReAct | Interleaved think + act | Tool-using agents | Prone to reasoning drift over many steps |
| Scratchpad | Free-form working memory | Long-horizon problem solving | No structured verification |
| Extended thinking (Claude) | Internal chain-of-thought | Complex reasoning without prompting | Not inspectable; higher latency |
Structured reasoning before action reduces agent errors by forcing explicit state tracking. The key discipline: separate the reasoning phase (what do I know, what do I need, what is my plan?) from the action phase (call tool X with parameters Y). Agents that blend reasoning and action in the same output frequently skip reasoning steps under token pressure. Use dedicated XML or JSON blocks for reasoning vs action to enforce this separation structurally rather than relying on the model to self-regulate.
When evaluating reasoning quality in agentic systems, assess the reasoning trace separately from the final action. A correct action reached through flawed reasoning is brittle — it will fail on variations of the same task. Use an LLM-as-judge to score both the reasoning quality (is the plan logically sound and complete?) and the action accuracy (did the agent call the right tool with the right parameters?). Track these metrics separately in your eval dashboard so you can distinguish between reasoning failures and execution failures.
Budget reasoning tokens explicitly in complex agentic tasks. Set a thinking budget (2,000 tokens for simple tasks, 8,000 for complex) and monitor usage in your cost dashboard. Tasks that consistently max out their thinking budget may benefit from upstream decomposition that routes sub-tasks to specialist agents, each requiring less reasoning than a single monolithic agent handling full complexity.