Agentic Reasoning

Why structure matters before action
Reasoning frameworks for agents
Implementing structured reasoning
Chain-of-thought vs agentic reasoning
Scratchpad patterns
Extended thinking with Claude
Gotchas

SECTION 01

Why structure matters before action

An agent that immediately acts on a complex task often takes wrong turns, makes false assumptions, and produces confident but incorrect outputs. The fix is simple and powerful: reason before acting.

Agentic reasoning is the practice of having the agent explicitly structure its understanding of a problem — identify subgoals, note assumptions, list constraints, and outline an approach — before taking any action. This structured pre-planning forces the model to confront ambiguities before they become mistakes, and produces a readable "reasoning trace" that makes the agent's behaviour auditable.

Think of it as the difference between a developer who immediately starts coding versus one who first writes a brief design document. The second developer's code is typically more correct, more maintainable, and takes fewer dead ends.

SECTION 02

Reasoning frameworks for agents

Several structured frameworks have proven effective for agentic reasoning:

OODA loop (Observe-Orient-Decide-Act): from military strategy. Observe what's happening → Orient (make sense of it in context) → Decide (pick an action) → Act. Cycle rapidly.

Subgoal decomposition: break the top-level goal into a tree of subgoals. Each subgoal is independently achievable and contributes to the parent goal. Execute leaf subgoals first, building toward the root.

Assumption mapping: before acting, list explicit assumptions ("I'm assuming the user wants Python code", "I'm assuming the database is PostgreSQL"). If an assumption is wrong, the plan is wrong. Listing them makes them reviewable.

Constraint identification: list constraints upfront ("must complete in under 10 API calls", "output must be under 500 words", "cannot use deprecated functions"). Constraints filter the action space before the first tool call.

SECTION 03

Implementing structured reasoning

import anthropic

client = anthropic.Anthropic()

REASONING_PROMPT = '''Before taking any action, produce a structured analysis:

GOAL: [Restate the task in your own words]
SUBGOALS: [List 2-5 specific subgoals needed to achieve the main goal]
ASSUMPTIONS: [List key assumptions you're making]
CONSTRAINTS: [List any constraints or limitations]
APPROACH: [Briefly outline your step-by-step approach]
RISKS: [Identify potential failure modes]

Then proceed with the task.'''

def reasoning_agent(task: str, tools: list[dict]) -> str:
    # Phase 1: Structured reasoning (no tool calls)
    reasoning_response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        system=REASONING_PROMPT,
        messages=[{"role": "user", "content": task}]
    )
    reasoning = reasoning_response.content[0].text
    print(f"Reasoning:
{reasoning}
{'='*50}")

    # Phase 2: Execute with reasoning as context
    execution_response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=2048,
        system="You are a helpful assistant. Execute the plan you've outlined.",
        tools=tools,
        messages=[
            {"role": "user", "content": task},
            {"role": "assistant", "content": f"My analysis:
{reasoning}

Now I'll execute this plan:"},
        ]
    )
    return execution_response.content[0].text

# The agent reasons first, then acts
result = reasoning_agent(
    task="Audit our Python codebase's security: check for SQL injection, hardcoded secrets, and insecure dependencies.",
    tools=[
        {"name": "run_grep", "description": "Run regex search on codebase.", "input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}, "required": ["pattern"]}},
    ]
)

SECTION 04

Chain-of-thought vs agentic reasoning

Chain-of-thought (CoT) prompting ("think step by step") is a single-response technique: the model reasons within one completion before giving its final answer. It's fast and cheap but limited to what the model knows at that moment.

Agentic reasoning is multi-step and multi-turn: the model reasons, acts, observes results, re-reasons in light of new information, and acts again. It can revise its plan mid-execution when the environment pushes back ("the API returned an error — I need to try a different approach").

Chain-of-thought:
User → [Reason] → Answer   (single completion)

Agentic reasoning:
User → [Reason] → [Act] → [Observe] → [Re-reason] → [Act] → ... → Answer
         ↑                                   ↑
    upfront plan                    adaptive re-planning

Use CoT when: the task can be solved from the model's training knowledge, no tools are needed, and you want a fast, single-turn response. Use agentic reasoning when: the task requires information gathering, multi-step execution, or the right approach depends on intermediate results.

SECTION 05

Scratchpad patterns

A scratchpad is a designated space for the agent to "think out loud" — intermediate reasoning that isn't part of the final output. This separates the visible answer from the working-out:

SCRATCHPAD_SYSTEM = '''You have a private scratchpad for thinking. Use it freely.
Format your response as:

[Your reasoning, calculations, drafts here — not shown to user]


[Your final answer here]
'''

def scratchpad_agent(task: str) -> tuple[str, str]:
    response = client.messages.create(
        model="claude-sonnet-4-5", max_tokens=2048,
        system=SCRATCHPAD_SYSTEM,
        messages=[{"role": "user", "content": task}]
    )
    text = response.content[0].text
    # Parse scratchpad and answer
    import re
    scratchpad = re.search(r'(.*?)', text, re.DOTALL)
    answer = re.search(r'(.*?)', text, re.DOTALL)
    return (
        scratchpad.group(1).strip() if scratchpad else "",
        answer.group(1).strip() if answer else text
    )

thinking, final_answer = scratchpad_agent(
    "A train leaves at 9am at 120km/h. Another leaves the same station at 10am at 150km/h. When does the second overtake the first?"
)
print(f"Thinking: {thinking}")
print(f"Answer: {final_answer}")

SECTION 06

Extended thinking with Claude

Claude's extended thinking feature provides a native scratchpad: the model can think for longer before producing a final response, with thinking tokens that aren't charged at the same rate as output tokens:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000   # Allow up to 10K thinking tokens
    },
    messages=[{"role": "user", "content": "Design the architecture for a real-time agent monitoring system that tracks cost, latency, and error rates across 1000 concurrent agent instances."}]
)

# Response contains thinking blocks and text blocks
for block in response.content:
    if block.type == "thinking":
        print(f"[THINKING] {block.thinking[:200]}...")  # Internal reasoning
    elif block.type == "text":
        print(f"[ANSWER] {block.text}")

Extended thinking is most valuable for: complex planning tasks, multi-constraint problems, code design, mathematical reasoning, and any task where first-draft quality is substantially below the model's capability. The thinking tokens are not billed at the same rate as output tokens.

SECTION 07

Gotchas

Reasoning tokens cost money too. A 2,000-token reasoning preamble before every action adds significant cost to long agent runs. Be deliberate about when you invoke structured reasoning — use it for complex planning steps, not trivial actions ("call the search tool with this query" doesn't need a 500-token reasoning preamble).

Explicit reasoning can be over-confident. "I assume X" in a plan can lock the model into a wrong assumption even when evidence to the contrary emerges. Build in explicit assumption-checking steps: after gathering initial information, revisit the assumption list and update the plan if needed.

Users don't want to see reasoning traces. Internal reasoning is for you and the system, not the user. Always strip scratchpad content before returning to the user. Showing raw reasoning makes responses feel unpolished and can reveal sensitive intermediate steps.

SECTION 08

Reasoning Framework Comparison

Framework	Structure	Best For	Limitation
Chain-of-Thought	Linear reasoning steps	Math, logical deduction	No backtracking; single path
Tree-of-Thought	Branching + pruning	Open-ended exploration, planning	High cost; complex to implement
ReAct	Interleaved think + act	Tool-using agents	Prone to reasoning drift over many steps
Scratchpad	Free-form working memory	Long-horizon problem solving	No structured verification
Extended thinking (Claude)	Internal chain-of-thought	Complex reasoning without prompting	Not inspectable; higher latency

Structured reasoning before action reduces agent errors by forcing explicit state tracking. The key discipline: separate the reasoning phase (what do I know, what do I need, what is my plan?) from the action phase (call tool X with parameters Y). Agents that blend reasoning and action in the same output frequently skip reasoning steps under token pressure. Use dedicated XML or JSON blocks for reasoning vs action to enforce this separation structurally rather than relying on the model to self-regulate.

When evaluating reasoning quality in agentic systems, assess the reasoning trace separately from the final action. A correct action reached through flawed reasoning is brittle — it will fail on variations of the same task. Use an LLM-as-judge to score both the reasoning quality (is the plan logically sound and complete?) and the action accuracy (did the agent call the right tool with the right parameters?). Track these metrics separately in your eval dashboard so you can distinguish between reasoning failures and execution failures.

Budget reasoning tokens explicitly in complex agentic tasks. Set a thinking budget (2,000 tokens for simple tasks, 8,000 for complex) and monitor usage in your cost dashboard. Tasks that consistently max out their thinking budget may benefit from upstream decomposition that routes sub-tasks to specialist agents, each requiring less reasoning than a single monolithic agent handling full complexity.