An agent architecture that separates planning from execution — a planner LLM creates a multi-step plan, then an executor carries out each step, with optional replanning based on results.
In ReAct, the same model does everything: it decides what to do, does it, evaluates the result, and decides the next step — all interleaved. For short tasks (2–3 steps) this works well. For longer tasks (10+ steps), it breaks down: the model gets distracted by intermediate results and loses sight of the original goal, and the growing context window makes each step more expensive and slower.
Plan-and-Execute separates concerns: a planner model sees the task and produces a complete step-by-step plan upfront (high-level, goal-oriented thinking). An executor model then carries out each step independently (local, focused thinking). This mirrors how humans work: a manager defines the project plan; individual contributors execute tasks without needing the full strategic context.
User query
│
▼
┌─────────────┐
│ Planner │ → Creates ordered list of steps
│ (LLM) │ e.g.: ["Search for X", "Summarise results",
└─────────────┘ "Format as report", "Email to team"]
│
▼ step list
┌─────────────┐
│ Executor │ → Executes each step with tools
│ (LLM) │ → Accumulates results in shared state
└─────────────┘
│
├── step 1 result → [optional: Replanner checks if plan still valid]
├── step 2 result
├── ...
└── final result → User
The executor uses a compact context: just the current step + accumulated results, not the full history. This keeps each execution cheap and focused.
import anthropic
import json
client = anthropic.Anthropic()
def create_plan(task: str) -> list[str]:
'''Ask the planner LLM to decompose the task into concrete steps.'''
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": f'''You are a task planner. Break down the following task
into a numbered list of concrete, executable steps. Each step should be a single,
independent action. Be specific about what information is needed at each step.
Task: {task}
Return ONLY a JSON array of step strings, e.g.:
["Search for recent sales data", "Calculate total revenue", "Create a summary"]'''}]
)
text = response.content[0].text.strip()
# Extract JSON from response
start = text.find('[')
end = text.rfind(']') + 1
return json.loads(text[start:end])
# Test
task = "Research the top 3 Python web frameworks, compare their GitHub stars, and write a recommendation."
plan = create_plan(task)
for i, step in enumerate(plan, 1):
print(f"Step {i}: {step}")
# Step 1: Search for Flask GitHub repository and get star count
# Step 2: Search for Django GitHub repository and get star count
# Step 3: Search for FastAPI GitHub repository and get star count
# Step 4: Compare the three frameworks based on stars, use cases, and community
# Step 5: Write a concise recommendation paragraph
import anthropic
client = anthropic.Anthropic()
def execute_step(step: str, context: str, tools: list) -> str:
'''Execute a single plan step, given accumulated context.'''
response = client.messages.create(
model="claude-3-5-haiku-20241022", # cheaper model for execution
max_tokens=1024,
tools=tools,
messages=[{"role": "user", "content": f'''Previous results:
{context if context else "None yet."}
Current step to execute: {step}
Use tools as needed to complete this step. Return a concise result.'''}]
)
# Handle tool calls recursively (simplified)
if response.stop_reason == "tool_use":
# ... handle tool calls as shown in Tool Use page ...
pass
return response.content[0].text if response.content else ""
def run_plan_execute(task: str, tools: list) -> str:
# Phase 1: Plan
plan = create_plan(task)
print(f"Plan created: {len(plan)} steps")
# Phase 2: Execute
context = ""
results = []
for i, step in enumerate(plan, 1):
print(f"\nExecuting step {i}/{len(plan)}: {step}")
result = execute_step(step, context, tools)
results.append(f"Step {i} ({step}): {result}")
context = "\n".join(results) # accumulate results as context
print(f"Result: {result[:100]}...")
return context # final accumulated result
final = run_plan_execute(
"Compare Flask and FastAPI GitHub stars and write a recommendation.",
tools=[] # add your real tools here
)
print("\nFinal result:")
print(final)
def replan_if_needed(original_plan: list[str], completed_steps: list[str],
last_result: str, remaining_steps: list[str]) -> list[str]:
'''Check if the plan needs updating based on new information.'''
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=512,
messages=[{"role": "user", "content": f'''
Original plan: {original_plan}
Completed steps: {completed_steps}
Last result: {last_result}
Remaining steps: {remaining_steps}
Does the last result reveal that the remaining steps need to be adjusted?
If yes, return a JSON array with the updated remaining steps.
If no, return the remaining steps unchanged as a JSON array.
'''}]
)
text = response.content[0].text.strip()
start, end = text.find('['), text.rfind(']') + 1
return json.loads(text[start:end])
# Use in the executor loop after each step result:
# remaining = replan_if_needed(plan, completed, result, remaining)
Replanning is optional but valuable for tasks where intermediate results reveal the original plan was wrong (e.g., a search returns no results, requiring a different search strategy).
from langchain_experimental.plan_and_execute import (
PlanAndExecute, load_agent_executor, load_chat_planner
)
from langchain_anthropic import ChatAnthropic
from langchain.tools import tool
@tool
def web_search(query: str) -> str:
'''Search the web for current information.'''
return f"Search results for: {query}"
@tool
def calculator(expression: str) -> str:
'''Evaluate arithmetic.'''
return str(eval(expression, {"__builtins__": {}}, {}))
tools = [web_search, calculator]
# Planner uses a more capable model; executor uses a faster one
planner_llm = ChatAnthropic(model="claude-3-5-sonnet-20241022")
executor_llm = ChatAnthropic(model="claude-3-5-haiku-20241022")
planner = load_chat_planner(planner_llm)
executor = load_agent_executor(executor_llm, tools, verbose=True)
agent = PlanAndExecute(planner=planner, executor=executor, verbose=True)
result = agent.run("What is the square root of the number of days in a non-leap year?")
print(result)
Plans can be over-optimistic. The planner generates a plan without knowing what the executor will actually find. Step 3 might depend on data that Step 2 discovers doesn't exist. Replanning (or at minimum, having the executor handle "step not achievable" gracefully) is essential for robustness.
Context accumulation blows up. If you concatenate all step results into the executor's context, by step 10 you're sending thousands of tokens for background that may be irrelevant. Use a summariser after every 3–4 steps to compress history.
Planner and executor need aligned vocabularies. If the planner says "Search GitHub API" but the executor only has a generic web_search tool, the step may succeed but produce the wrong format. Either make the executor tools match the planner's vocabulary or give the planner a tool manifest to plan against.
Use a cheap model for execution. Each step is a small, focused task. Using Claude Sonnet for execution (when Haiku would suffice) wastes money. Reserve the expensive model for planning, where quality matters most.
| Architecture | Planning | Execution | Best For | Weakness |
|---|---|---|---|---|
| Plan-and-Execute | Upfront full plan | Sequential step execution | Well-defined multi-step tasks | Poor plan propagates; no mid-plan adaptation |
| ReAct | One step at a time | Interleaved with reasoning | Exploratory, unknown state | Reasoning drift over many steps |
| Hierarchical (planner + executor) | High-level subgoals | Executor sub-plans each subgoal | Complex long-horizon tasks | Coordination overhead |
| Reflection loop | Plan, reflect, replan | Execute revised plan | Quality-critical tasks | High token cost |
The key engineering choice in plan-and-execute is how to represent the plan. A flat ordered list works for simple sequential tasks. For tasks with dependencies and parallelism opportunities, a directed acyclic graph (DAG) representation is superior -- it allows the executor to identify independent steps and run them concurrently. Represent the plan as a JSON object with steps, dependencies, and status fields so it can be serialised for checkpointing and resumed after failures without re-running completed steps.
Invest in plan quality evaluation as a separate metric from task completion quality. Have an LLM judge rate generated plans on three dimensions: completeness (does the plan cover all necessary steps?), feasibility (can each step realistically be executed with available tools?), and efficiency (does the plan avoid redundant steps?). Plans that score below threshold on any dimension should trigger a replan before execution begins, not after a costly failed execution run.
The plan-and-execute pattern separates reasoning from action, which provides a key debugging advantage: you can inspect the plan before execution begins and catch logical errors early. In multi-step workflows where errors compound — an incorrect intermediate result corrupts all downstream steps — front-loading the planning phase and validating the plan structure before any tools are called dramatically reduces wasted API calls and time.
Dynamic re-planning is the critical enhancement that makes plan-and-execute practical for real-world tasks. A static plan generated upfront cannot anticipate every possible outcome of intermediate steps. When an execution step returns an unexpected result — a tool call fails, an API returns empty data, or a sub-task turns out to be more complex than anticipated — the agent must update the remaining plan to account for the new information. This feedback loop between execution results and plan revision is what distinguishes robust agents from brittle ones.