Agentic Reasoning

Agent Planning & ReAct

How LLM agents decide what to do next — ReAct, plan-and-execute, reflection, and tree search

Reason → Act → Observe the ReAct loop
plan first or act first the planning tradeoff
reflection loops quality boosters
Contents
  1. The agent loop
  2. ReAct interleaved
  3. Plan-and-execute
  4. Reflection & critique
  5. Tree search & MCTS
  6. Tool selection & recovery
  7. Memory types & state
01 — Foundation

The Agent Loop

The core loop never changes: (1) receive task, (2) reason about next action, (3) call a tool/act, (4) observe result, (5) update state, repeat until done. This is the heartbeat of every agent.

Why the Loop Matters

Without planning: one-shot LLM call to generate everything → brittle, no recovery from errors, fails on unseen edge cases. With planning: multi-step, each step informed by previous results → adaptive, self-correcting, can recover from mistakes by replanning.

Basic Agent Trace

Task: "Summarize the top 3 AI papers from this week" Step 1 — Thought: I need to search for recent AI papers Action: search("AI papers arxiv 2024 this week") Observation: [list of 20 papers returned] Step 2 — Thought: I have results, now I should read top 3 abstracts Action: fetch_abstract(paper_ids[0:3]) Observation: [3 abstracts] Step 3 — Thought: I have enough context to write the summary Action: FINISH Output: "Here are the top 3 papers..."

Each step depends on the previous observation. The agent reasons explicitly before acting. The observation updates context. This loop enables agents to recover from failures and adapt.

02 — Dominant Pattern

ReAct: Reason + Act Interleaved

ReAct (Yao et al. 2022): interleave chain-of-thought reasoning (Thought) with concrete actions (Action/Observation) in the same context. Why it works: explicit Thought forces the model to commit to a plan before acting; Observation updates the context; looping gives recovery from errors.

Comparison: Naive vs CoT vs ReAct

Approach Reasoning Grounding Error recovery Multi-step
Direct answer None None None No
CoT only Internal None Partial Limited
ReAct Explicit Tool results Yes Yes

ReAct Prompt Structure & Example

System: You have access to these tools: search(query), calculator(expr), finish(answer) Use this format: Thought: [your reasoning] Action: tool_name(args) Observation: [tool result — injected by system] ... (repeat until you call finish) Question: What is the GDP of France divided by its population? Thought: I need France's GDP and population figures Action: search("France GDP 2023") Observation: France GDP 2023: $3.05 trillion Thought: Now I need the population Action: search("France population 2023") Observation: France population: 68 million Thought: I can now calculate Action: calculator(3.05e12 / 68e6) Observation: 44852.94 Thought: I have the answer Action: finish("France's GDP per capita is approximately $44,853")
ReAct is the dominant pattern in production agents today. It's the foundation of LangGraph's agent executor, OpenAI's function-calling loop, and most agent SDKs. Simple, proven, and remarkably effective.
03 — Structured Approach

Plan-and-Execute

Two-phase: (1) generate a full plan upfront (list of steps), (2) execute each step with a subagent or tool. Planner = LLM call that produces a list of steps; Executor = ReAct agent for each step, passing results forward.

Advantages and Disadvantages

Advantages: Plan is auditable, explainable, interruptible; enables parallelism (independent steps can run in parallel); stakeholders can review before execution. Disadvantages: upfront plan can be wrong; hard to re-plan mid-execution without restarting; rigid step decomposition fails on emerging needs.

Plan-Then-Execute Example

Planner output: 1. Search for the company's latest 10-K filing 2. Extract revenue figures for last 3 years 3. Calculate YoY growth rates 4. Identify top 3 risk factors 5. Write executive summary Executor: spawns subagent for each step, passes results forward Step 1 result → Step 2 input → Step 3 input → ... → Step 5
⚠️ Plan-and-execute works for structured tasks with known step shapes. For open-ended research or tasks requiring reactive replanning, stick with ReAct. The rigid structure breaks when the task surprises you.
04 — Learning from Failure

Reflection and Self-Critique

Reflexion (Shinn et al. 2023): after a failed attempt, generate a textual critique, store it in episodic memory, try again. Self-RAG: LLM decides whether to retrieve, generates, critiques its own answer for factuality. The reflection loop: generate → critique → refine → generate again.

Reflexion Loop Example

Attempt 1: [write Python function] Test result: AssertionError — wrong output Reflection: "I misread the edge case for empty input. Next attempt: handle len(arr)==0 separately." Attempt 2: [revised function with empty check] Test result: All tests pass ✓

ReAct vs Plan-and-Execute vs Reflexion

Pattern Replanning Auditability Best for
ReAct Every step Medium (trace) Open-ended tasks
Plan-and-execute On failure only High (plan visible) Structured tasks
Reflexion After failure High (critique stored) Code, competitive tasks
06 — Execution

Tool Selection and Error Recovery

Tool selection failure modes: wrong tool chosen, correct tool with wrong args, tool returns error, infinite loop. Mitigations are critical for production reliability.

Mitigation Strategies

Tool descriptions: must be unambiguous. Bad: "search". Good: "web_search(query: str) → list of {title, url, snippet}". Validation: validate tool args before calling. Max iterations: enforce a ceiling on loop depth. Error handling: handle tool errors gracefully, don't let exceptions bubble up.

Robust Tool-Call with Retry

for attempt in range(3): try: result = tool_executor(tool_name, args) break except ToolError as e: messages.append({"role": "tool", "content": f"Error: {e}"}) messages.append({"role": "user", "content": "The tool failed. Try a different approach."}) result = llm.invoke(messages) # re-plan
⚠️ The most common production bug: agent loops forever calling the wrong tool. Always enforce max_iterations and surface failures explicitly in the trace. Log failed tool calls prominently.
07 — State Management

Memory Types and State

Agents need multiple memory systems working together. Each serves a different purpose and has different constraints.

Four Memory Types

Short-term (in-context): conversation history, scratchpad, tool results — limited by context window, cleared at session end. Long-term (external): vector store of past experiences, episodic memory, fact cache — persistent across sessions, queryable. Procedural (prompt): the agent's "skills" — encoded in system prompt, updated via prompt engineering, static within session. Checkpoint (persisted state): full graph state saved to DB (LangGraph pattern), enables pause/resume across sessions.

Memory Comparison Table

Type Storage Lifespan Best for
In-context Token window Current session Working memory, recent history
Vector/episodic External DB Persistent Past task recall, personalization
KV cache GPU VRAM Request lifetime Token reuse, prefix caching
Checkpointed state DB (LangGraph) Persistent Long-running workflows, resume

Memory Tools and Platforms

Framework
LangGraph
First-class state checkpoint persistence
Memory
mem0
Long-term episodic memory layer for agents
Memory
Zep
Conversation memory with semantic search
Memory
MemGPT / Letta
Virtual context management for long conversations
Platform
OpenAI Assistants
Managed memory + file storage + tool calling
Tracing
LangSmith
Trace and evaluate agent decisions
Tracing
Langfuse
Open-source observability for long-running agents
Platform
W&B Traces
Lightweight ML ops integration
References
07 — Further Reading

References

Key Papers