Interleave reasoning (Thought) and action (Act) steps with observed results β the foundational prompt pattern for tool-using agents.
A calculator can compute, but it doesn't know when to compute. A search engine can retrieve, but it doesn't know what to search for. ReAct is the pattern that connects language models to tools β it lets the model decide what action to take based on reasoning, observe the result, then reason about the next action.
ReAct stands for Reasoning + Acting. The model alternates between "thinking out loud" (Thought) and "calling a tool" (Action), with the tool's result (Observation) fed back in before the next thought. This loop continues until the model has enough information to give a final answer.
Two things make ReAct better than pure retrieval or pure reasoning:
This makes ReAct especially powerful for multi-hop questions ("who was the president when X happened and what did they do about Y") where each answer creates the next question.
ReAct's interleaved reasoning-action pattern differs from pure chain-of-thought (reasoning only, no actions) and pure tool-use (actions without explicit reasoning traces). The key advantage of ReAct over tool-use without reasoning is interpretability β the thought trace before each action reveals why the agent chose that action, making debugging significantly easier when the agent makes incorrect tool calls. Compared to AutoGPT-style agents that plan extensively before acting, ReAct's step-by-step reasoning-action alternation allows the agent to update its plan based on actual observation results rather than committing to a full plan upfront.
| Pattern | Reasoning | Actions | Best for |
|---|---|---|---|
| Chain-of-Thought | Explicit step-by-step | None | Mathematical reasoning |
| Tool-use only | Implicit | Yes | Simple single-tool tasks |
| ReAct | Explicit per-action | Yes, interleaved | Multi-step tool use |
| Plan-and-Execute | Full upfront plan | Sequential execution | Well-defined complex tasks |
The most common ReAct failure mode is reasoning hallucination β the model generates a plausible-looking thought that does not correctly reflect the available information, leading to incorrect tool calls that compound errors across subsequent steps. Mitigation strategies include: limiting reasoning to one inference step per action (preventing long reasoning chains from drifting), requiring the model to quote relevant information from previous observations in its reasoning, and implementing observation validation that flags when the model's stated belief contradicts what was actually observed in previous steps.
Tool design for ReAct agents significantly affects agent reliability. Tools with narrow, well-defined input schemas produce fewer malformed tool calls than tools with flexible or ambiguous parameters. Including parameter validation with informative error messages β rather than silently accepting invalid inputs β helps the agent self-correct when it provides incorrect arguments, because the error message in the observation feeds back to the reasoning step where the model can identify the problem. Each tool should have a single clear purpose; combining multiple actions in one tool increases the probability of partial success states that confuse the agent's subsequent reasoning.
Maximum step limits are a critical safety mechanism for production ReAct agents that prevent runaway loops from consuming unbounded tokens and incurring unlimited costs. Most ReAct frameworks accept a max_iterations or max_steps parameter that terminates the agent after a configurable number of thought-action-observation cycles. Setting this limit requires calibrating against the expected step count distribution for legitimate tasks β too low causes premature termination on complex tasks, too high allows runaway agents to consume excessive resources before stopping. Logging the distribution of step counts in production and setting the limit at the 99th percentile plus buffer provides a data-driven approach to limit calibration.
Prompt engineering for ReAct agents is substantially more complex than for single-turn prompts because the prompt must define the tool interface, reasoning format, and stopping criteria in addition to the task objective. The most reliable ReAct prompts use few-shot examples that demonstrate the exact thought-action-observation format for representative tasks, including examples of successful completion and examples of the agent recognizing when it cannot complete a task and gracefully stopping. Models that have been instruction-tuned for tool use (such as GPT-4 with function calling or Claude with tool use) require less prompt engineering than base models because the tool-use format is part of their instruction tuning.
ReAct agent evaluation requires task-specific success metrics beyond simple completion rate. For information-seeking tasks, precision and recall of the retrieved information relative to a reference answer measures quality. For action-taking tasks, task completion rate and the number of steps taken (efficiency) measure both success and cost. Trajectory evaluation β examining the full thought-action-observation sequence rather than just the final output β reveals systematic reasoning errors that completion metrics miss, such as consistently choosing suboptimal tools or making the same logical error on similar task types. Recording and analyzing agent trajectories in production is essential for identifying improvement opportunities.
Structured output constraints can improve ReAct agent reliability by enforcing a specific JSON format for actions that is parsed programmatically rather than extracted from free text. Having the model output actions as {"tool": "search", "query": "..."} rather than "I will call the search tool with query ..." eliminates the free-text parsing step that introduces errors when the model deviates from the expected format. JSON-mode or function calling interfaces natively support this structured output approach, and models with native function calling support (GPT-4, Claude) produce more reliable structured outputs than models that must be prompted to produce JSON through the system prompt alone.