ReAct Agent

What makes an agent different from a chain
The ReAct loop explained
Building a ReAct agent from scratch
ReAct with LangChain
Tool design principles
When ReAct fails and how to fix it
Gotchas

SECTION 01

What makes an agent different from a chain

A chain is a pre-wired sequence: prompt → LLM → parse → next prompt. The path is fixed before you run it. An agent is different: the LLM itself decides at each step what to do next — it might call a search tool, then a calculator, then write a final answer, or it might loop back and search again if it realises its first result was wrong.

This dynamic control flow is what makes agents powerful: they can handle tasks where you don't know in advance how many steps are needed or which tools will be required. The cost is unpredictability — an agent might take a wrong path, loop, or call expensive tools unnecessarily.

ReAct (Reasoning + Acting) is the most widely-used agent pattern. The model explicitly articulates its reasoning before each action, making the control flow transparent and debuggable.

SECTION 02

The ReAct loop explained

Each step of a ReAct agent follows this pattern:

Thought: I need to find the current price of Apple stock.
Action: search("AAPL stock price today")
Observation: AAPL is trading at $189.42 as of 3:45 PM ET.
Thought: Now I have the price. I should also check if the user
         wanted it in a specific currency.
Action: final_answer("Apple (AAPL) is currently trading at $189.42 USD.")

The model produces Thought → Action pairs; your code runs the action and appends the Observation; the cycle repeats until the model calls final_answer or reaches a max step limit.

The Thought step is crucial: it forces the model to plan before acting, catching errors ("wait, I should check X first") that a pure action model would miss.

SECTION 03

Building a ReAct agent from scratch

import anthropic, json, re

client = anthropic.Anthropic()

# Define tools
def search(query: str) -> str:
    '''Simulate a web search (replace with real search API).'''
    results = {
        "AAPL stock price": "Apple (AAPL) is at $189.42",
        "Python creator": "Python was created by Guido van Rossum in 1991",
    }
    for key, val in results.items():
        if key.lower() in query.lower():
            return val
    return "No results found."

def calculator(expression: str) -> str:
    '''Evaluate a safe arithmetic expression.'''
    try:
        return str(eval(expression, {"__builtins__": {}}, {}))
    except Exception as e:
        return f"Error: {e}"

TOOLS = {"search": search, "calculator": calculator}

SYSTEM = '''You are a helpful assistant with access to tools.
To use a tool, respond with EXACTLY this format:
Thought: 
Action: ("")

When you have a final answer, respond with:
Thought: 
Final Answer: 

Available tools: search(query), calculator(expression)'''

def run_react_agent(user_query: str, max_steps: int = 8) -> str:
    messages = [{"role": "user", "content": user_query}]

    for step in range(max_steps):
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=1024,
            system=SYSTEM,
            messages=messages
        )
        text = response.content[0].text
        messages.append({"role": "assistant", "content": text})

        # Check for final answer
        if "Final Answer:" in text:
            return text.split("Final Answer:")[-1].strip()

        # Parse action
        action_match = re.search(r'Action:\s*(\w+)\("([^"]*)"\)', text)
        if action_match:
            tool_name, tool_input = action_match.group(1), action_match.group(2)
            if tool_name in TOOLS:
                observation = TOOLS[tool_name](tool_input)
            else:
                observation = f"Unknown tool: {tool_name}"
            messages.append({"role": "user", "content": f"Observation: {observation}"})
        else:
            # Model didn't call a tool or give a final answer — nudge it
            messages.append({"role": "user", "content": "Please either use a tool or provide a Final Answer."})

    return "Max steps reached without a final answer."

print(run_react_agent("What is AAPL's stock price multiplied by 100?"))

SECTION 04

ReAct with LangChain

from langchain.agents import create_react_agent, AgentExecutor
from langchain_anthropic import ChatAnthropic
from langchain import hub
from langchain.tools import tool

@tool
def search(query: str) -> str:
    '''Search the web for current information.'''
    # Replace with actual search implementation
    return f"Search results for: {query}"

@tool
def calculator(expression: str) -> str:
    '''Evaluate arithmetic expressions safely.'''
    try:
        return str(eval(expression, {"__builtins__": {}}, {}))
    except Exception as e:
        return f"Error: {e}"

tools = [search, calculator]

# Pull standard ReAct prompt from LangChain Hub
prompt = hub.pull("hwchase17/react")

llm = ChatAnthropic(model="claude-3-5-sonnet-20241022")
agent = create_react_agent(llm, tools, prompt)
executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,        # prints Thought/Action/Observation at each step
    max_iterations=10,
    handle_parsing_errors=True,
)

result = executor.invoke({"input": "What is 42 times 7?"})
print(result["output"])

SECTION 05

Tool design principles

The quality of a ReAct agent depends heavily on tool design. A bad tool description leads to wrong tool calls.

Be precise about input format:

@tool
def get_weather(location: str) -> str:
    '''Get current weather for a city.

    Args:
        location: City name and optional country code, e.g. "London, UK" or "Tokyo"

    Returns:
        Current temperature in Celsius and weather description.
    '''
    # implementation

Always return structured, parseable output:

import json

@tool
def get_stock_price(ticker: str) -> str:
    '''Get the current stock price for a ticker symbol like AAPL, MSFT, GOOGL.'''
    # Return as structured string the model can parse
    return json.dumps({"ticker": ticker, "price": 189.42, "currency": "USD", "timestamp": "2024-03-15T15:45:00Z"})

Error messages should guide the next action:

return f"Error: ticker '{ticker}' not found. Try the full company name or check spelling."
# Don't return: "Error code 404"

SECTION 06

When ReAct fails and how to fix it

Problem: Model loops endlessly. Set a hard max_iterations limit. Add a step counter to the context: "You have used X of Y allowed steps." This creates urgency that breaks loops.

Problem: Model ignores tool output. Your observation formatting matters. Instead of "42", return "The result of 6 × 7 is 42." Complete sentences in observations anchor the model better than bare values.

Problem: Wrong tool called. Tool names are confusing. Use clear, action-oriented names: search_web not search, execute_python_code not run. Add examples in the docstring.

Problem: Expensive tool called unnecessarily. Add a check_if_answer_known tool as the first option, or add a rule: "Always state what information you already have before deciding to call a tool."

SECTION 07

Gotchas

Context window fills up fast. Each Thought/Action/Observation adds tokens. A 10-step agent with verbose tool outputs can easily consume 20k tokens. Use a summariser to compress earlier steps, or limit observation length.

Parsing is fragile. ReAct relies on regex parsing of the model's output. Models occasionally deviate from the format. Implement robust error handling: if parsing fails, append the raw response and ask the model to reformat. LangChain's handle_parsing_errors=True does this automatically.

Non-determinism compounds. Each step adds stochasticity. A 5-step agent with 90% per-step accuracy has only 59% end-to-end accuracy. Use lower temperature (0.0–0.3) for multi-step agents, and add verification steps.

Tool errors must be handled gracefully. If a tool raises an exception, return a string error message — don't let the exception propagate and kill the agent loop. The model should be able to recover from tool failures and try a different approach.

SECTION 08

ReAct Agent Design Reference

Design Decision	Options	Tradeoff
Thought format	Free text vs structured XML	Free text is more flexible; XML is easier to parse
Max iterations	5–20 steps	Higher = more complex tasks; lower = fewer runaway loops
Tool result truncation	200–2000 tokens	More context = better reasoning; less = lower cost
Stop condition	Final answer vs tool signal	Explicit signal is more reliable
Memory	Full history vs rolling window	Full history is most accurate; window controls cost

The most common ReAct failure mode is "reasoning drift" — the agent's reasoning in the Thought step gradually diverges from the actual task goal over many iterations. Mitigate this by injecting the original task goal at the start of each Thought block: Task: {original_task} Thought:. This re-anchors the agent to the goal and reduces off-task tangents. For long-running ReAct agents, also add a periodic "progress check" step that explicitly asks the agent to assess whether it is making progress toward the goal.

For production ReAct agents, add structured logging of every Thought/Action/Observation cycle with timestamps and token counts. This log is your primary debugging tool when an agent produces an unexpected result. Review failed runs by reading the thought trace: the failure point is almost always visible as a step where the reasoning diverged from the task goal, a tool returned unexpected output, or the agent prematurely concluded it had finished. Store these traces for at least 30 days to enable root-cause analysis of reported incidents.

Enforce a consistent scratchpad format per ReAct cycle: Thought (one concise sentence about the current goal), Action (tool name plus JSON arguments), Observation (truncated tool result). Truncate observation outputs aggressively to around 200 tokens without losing key information -- verbose tool result logs waste context window space and make it harder for the model to maintain coherent reasoning across many steps.

For agents that handle sensitive domains (legal, medical, financial), add a pre-action review step before executing any tool call. The review step prompts a separate model instance to evaluate whether the planned action is safe, appropriate, and consistent with the stated user intent. This adds 200-400ms latency but significantly reduces the risk of irreversible harmful actions. Route flagged actions to human review rather than blocking them outright, to preserve recall while protecting against high-severity errors.