Agent Planning & ReAct

Contents

The agent loop
ReAct interleaved
Plan-and-execute
Reflection & critique
Tree search & MCTS
Tool selection & recovery
Memory types & state

01 — Foundation

The Agent Loop

The core loop never changes: (1) receive task, (2) reason about next action, (3) call a tool/act, (4) observe result, (5) update state, repeat until done. This is the heartbeat of every agent.

Why the Loop Matters

Without planning: one-shot LLM call to generate everything → brittle, no recovery from errors, fails on unseen edge cases. With planning: multi-step, each step informed by previous results → adaptive, self-correcting, can recover from mistakes by replanning.

Basic Agent Trace

Task: "Summarize the top 3 AI papers from this week" Step 1 — Thought: I need to search for recent AI papers Action: search("AI papers arxiv 2024 this week") Observation: [list of 20 papers returned] Step 2 — Thought: I have results, now I should read top 3 abstracts Action: fetch_abstract(paper_ids[0:3]) Observation: [3 abstracts] Step 3 — Thought: I have enough context to write the summary Action: FINISH Output: "Here are the top 3 papers..."

Each step depends on the previous observation. The agent reasons explicitly before acting. The observation updates context. This loop enables agents to recover from failures and adapt.

02 — Dominant Pattern

ReAct: Reason + Act Interleaved

ReAct (Yao et al. 2022): interleave chain-of-thought reasoning (Thought) with concrete actions (Action/Observation) in the same context. Why it works: explicit Thought forces the model to commit to a plan before acting; Observation updates the context; looping gives recovery from errors.

Comparison: Naive vs CoT vs ReAct

Approach	Reasoning	Grounding	Error recovery	Multi-step
Direct answer	None	None	None	No
CoT only	Internal	None	Partial	Limited
ReAct	Explicit	Tool results	Yes	Yes

ReAct Prompt Structure & Example

System: You have access to these tools: search(query), calculator(expr), finish(answer) Use this format: Thought: [your reasoning] Action: tool_name(args) Observation: [tool result — injected by system] ... (repeat until you call finish) Question: What is the GDP of France divided by its population? Thought: I need France's GDP and population figures Action: search("France GDP 2023") Observation: France GDP 2023: $3.05 trillion Thought: Now I need the population Action: search("France population 2023") Observation: France population: 68 million Thought: I can now calculate Action: calculator(3.05e12 / 68e6) Observation: 44852.94 Thought: I have the answer Action: finish("France's GDP per capita is approximately $44,853")

✓ ReAct is the dominant pattern in production agents today. It's the foundation of LangGraph's agent executor, OpenAI's function-calling loop, and most agent SDKs. Simple, proven, and remarkably effective.

03 — Structured Approach

Plan-and-Execute

Two-phase: (1) generate a full plan upfront (list of steps), (2) execute each step with a subagent or tool. Planner = LLM call that produces a list of steps; Executor = ReAct agent for each step, passing results forward.

Advantages and Disadvantages

Advantages: Plan is auditable, explainable, interruptible; enables parallelism (independent steps can run in parallel); stakeholders can review before execution. Disadvantages: upfront plan can be wrong; hard to re-plan mid-execution without restarting; rigid step decomposition fails on emerging needs.

Plan-Then-Execute Example

Planner output: 1. Search for the company's latest 10-K filing 2. Extract revenue figures for last 3 years 3. Calculate YoY growth rates 4. Identify top 3 risk factors 5. Write executive summary Executor: spawns subagent for each step, passes results forward Step 1 result → Step 2 input → Step 3 input → ... → Step 5

⚠️ Plan-and-execute works for structured tasks with known step shapes. For open-ended research or tasks requiring reactive replanning, stick with ReAct. The rigid structure breaks when the task surprises you.

04 — Learning from Failure

Reflection and Self-Critique

Reflexion (Shinn et al. 2023): after a failed attempt, generate a textual critique, store it in episodic memory, try again. Self-RAG: LLM decides whether to retrieve, generates, critiques its own answer for factuality. The reflection loop: generate → critique → refine → generate again.

Reflexion Loop Example

Attempt 1: [write Python function] Test result: AssertionError — wrong output Reflection: "I misread the edge case for empty input. Next attempt: handle len(arr)==0 separately." Attempt 2: [revised function with empty check] Test result: All tests pass ✓

ReAct vs Plan-and-Execute vs Reflexion

Pattern	Replanning	Auditability	Best for
ReAct	Every step	Medium (trace)	Open-ended tasks
Plan-and-execute	On failure only	High (plan visible)	Structured tasks
Reflexion	After failure	High (critique stored)	Code, competitive tasks

05 — Advanced Search

Tree Search and Monte Carlo Planning

Tree of Thoughts (ToT): explore multiple reasoning paths simultaneously, backtrack from dead ends. MCTS for agents: score each candidate action with a value function (another LLM call), expand promising branches. o1/o3 test-time compute: a form of learned tree search — model runs internal reasoning traces and scores them.

Cost and Applicability

Tree search is expensive: N calls per step where N = branching factor. Explore 3 branches × 4 levels deep = 81 LLM calls. Reserve for high-stakes tasks: competition, formal verification, critical decisions. For most production agent tasks (tool-use, RAG), ReAct is 10× cheaper and works just as well.

Tree of Thoughts Structure

Task: prove a math theorem Node 0: problem statement ├─ Branch A: "try induction" → value=0.8 │ ├─ A1: base case proof → value=0.9 │ └─ A2: inductive step → stuck → backtrack └─ Branch B: "try contradiction" → value=0.6 └─ B1: ... Best path: A → A1 → complete proof

⚠️ ToT and MCTS make sense for math and code competitions. For most production agent tasks (tool-use, RAG), ReAct is 10× cheaper and works just as well. Don't pay for search you don't need.

06 — Execution

Tool Selection and Error Recovery

Tool selection failure modes: wrong tool chosen, correct tool with wrong args, tool returns error, infinite loop. Mitigations are critical for production reliability.

Mitigation Strategies

Tool descriptions: must be unambiguous. Bad: "search". Good: "web_search(query: str) → list of {title, url, snippet}". Validation: validate tool args before calling. Max iterations: enforce a ceiling on loop depth. Error handling: handle tool errors gracefully, don't let exceptions bubble up.

Robust Tool-Call with Retry

for attempt in range(3): try: result = tool_executor(tool_name, args) break except ToolError as e: messages.append({"role": "tool", "content": f"Error: {e}"}) messages.append({"role": "user", "content": "The tool failed. Try a different approach."}) result = llm.invoke(messages) # re-plan

⚠️ The most common production bug: agent loops forever calling the wrong tool. Always enforce max_iterations and surface failures explicitly in the trace. Log failed tool calls prominently.

07 — State Management

Memory Types and State

Agents need multiple memory systems working together. Each serves a different purpose and has different constraints.

Four Memory Types

Short-term (in-context): conversation history, scratchpad, tool results — limited by context window, cleared at session end. Long-term (external): vector store of past experiences, episodic memory, fact cache — persistent across sessions, queryable. Procedural (prompt): the agent's "skills" — encoded in system prompt, updated via prompt engineering, static within session. Checkpoint (persisted state): full graph state saved to DB (LangGraph pattern), enables pause/resume across sessions.

Memory Comparison Table

Type	Storage	Lifespan	Best for
In-context	Token window	Current session	Working memory, recent history
Vector/episodic	External DB	Persistent	Past task recall, personalization
KV cache	GPU VRAM	Request lifetime	Token reuse, prefix caching
Checkpointed state	DB (LangGraph)	Persistent	Long-running workflows, resume

Memory Tools and Platforms

Framework

LangGraph

First-class state checkpoint persistence

Memory

mem0

Long-term episodic memory layer for agents

Memory

Zep

Conversation memory with semantic search

Memory

MemGPT / Letta

Virtual context management for long conversations

Platform

OpenAI Assistants

Managed memory + file storage + tool calling

Tracing

LangSmith

Trace and evaluate agent decisions

Tracing

Langfuse

Open-source observability for long-running agents

Platform

W&B Traces

Lightweight ML ops integration

References

PAPER ReAct: Synergizing Reasoning and Acting in Language Models — arxiv 2210.03629
PAPER Reflexion: Language Agents with Verbal Reinforcement Learning — arxiv 2303.11366
PAPER Tree of Thoughts: Deliberate Problem Solving with LLMs — arxiv 2305.10601
DOCS LangGraph Agent Documentation — agent execution patterns
BLOG LangChain Blog: Plan-and-Execute Agents — design patterns

Agent Planning & ReAct

The Agent Loop

Why the Loop Matters

Basic Agent Trace

ReAct: Reason + Act Interleaved

Comparison: Naive vs CoT vs ReAct

ReAct Prompt Structure & Example

Plan-and-Execute

Advantages and Disadvantages

Plan-Then-Execute Example

Reflection and Self-Critique

Reflexion Loop Example

ReAct vs Plan-and-Execute vs Reflexion

Tree Search and Monte Carlo Planning

Cost and Applicability

Tree of Thoughts Structure

Tool Selection and Error Recovery

Mitigation Strategies

Robust Tool-Call with Retry

Memory Types and State

Four Memory Types

Memory Comparison Table

Memory Tools and Platforms

Related concepts

References