How LLM agents decide what to do next — ReAct, plan-and-execute, reflection, and tree search
The core loop never changes: (1) receive task, (2) reason about next action, (3) call a tool/act, (4) observe result, (5) update state, repeat until done. This is the heartbeat of every agent.
Without planning: one-shot LLM call to generate everything → brittle, no recovery from errors, fails on unseen edge cases. With planning: multi-step, each step informed by previous results → adaptive, self-correcting, can recover from mistakes by replanning.
Each step depends on the previous observation. The agent reasons explicitly before acting. The observation updates context. This loop enables agents to recover from failures and adapt.
ReAct (Yao et al. 2022): interleave chain-of-thought reasoning (Thought) with concrete actions (Action/Observation) in the same context. Why it works: explicit Thought forces the model to commit to a plan before acting; Observation updates the context; looping gives recovery from errors.
| Approach | Reasoning | Grounding | Error recovery | Multi-step |
|---|---|---|---|---|
| Direct answer | None | None | None | No |
| CoT only | Internal | None | Partial | Limited |
| ReAct | Explicit | Tool results | Yes | Yes |
Two-phase: (1) generate a full plan upfront (list of steps), (2) execute each step with a subagent or tool. Planner = LLM call that produces a list of steps; Executor = ReAct agent for each step, passing results forward.
Advantages: Plan is auditable, explainable, interruptible; enables parallelism (independent steps can run in parallel); stakeholders can review before execution. Disadvantages: upfront plan can be wrong; hard to re-plan mid-execution without restarting; rigid step decomposition fails on emerging needs.
Reflexion (Shinn et al. 2023): after a failed attempt, generate a textual critique, store it in episodic memory, try again. Self-RAG: LLM decides whether to retrieve, generates, critiques its own answer for factuality. The reflection loop: generate → critique → refine → generate again.
| Pattern | Replanning | Auditability | Best for |
|---|---|---|---|
| ReAct | Every step | Medium (trace) | Open-ended tasks |
| Plan-and-execute | On failure only | High (plan visible) | Structured tasks |
| Reflexion | After failure | High (critique stored) | Code, competitive tasks |
Tree of Thoughts (ToT): explore multiple reasoning paths simultaneously, backtrack from dead ends. MCTS for agents: score each candidate action with a value function (another LLM call), expand promising branches. o1/o3 test-time compute: a form of learned tree search — model runs internal reasoning traces and scores them.
Tree search is expensive: N calls per step where N = branching factor. Explore 3 branches × 4 levels deep = 81 LLM calls. Reserve for high-stakes tasks: competition, formal verification, critical decisions. For most production agent tasks (tool-use, RAG), ReAct is 10× cheaper and works just as well.
Tool selection failure modes: wrong tool chosen, correct tool with wrong args, tool returns error, infinite loop. Mitigations are critical for production reliability.
Tool descriptions: must be unambiguous. Bad: "search". Good: "web_search(query: str) → list of {title, url, snippet}". Validation: validate tool args before calling. Max iterations: enforce a ceiling on loop depth. Error handling: handle tool errors gracefully, don't let exceptions bubble up.
Agents need multiple memory systems working together. Each serves a different purpose and has different constraints.
Short-term (in-context): conversation history, scratchpad, tool results — limited by context window, cleared at session end. Long-term (external): vector store of past experiences, episodic memory, fact cache — persistent across sessions, queryable. Procedural (prompt): the agent's "skills" — encoded in system prompt, updated via prompt engineering, static within session. Checkpoint (persisted state): full graph state saved to DB (LangGraph pattern), enables pause/resume across sessions.
| Type | Storage | Lifespan | Best for |
|---|---|---|---|
| In-context | Token window | Current session | Working memory, recent history |
| Vector/episodic | External DB | Persistent | Past task recall, personalization |
| KV cache | GPU VRAM | Request lifetime | Token reuse, prefix caching |
| Checkpointed state | DB (LangGraph) | Persistent | Long-running workflows, resume |