LLMs that take actions — perceive, reason, act, observe, repeat
An agent is an LLM that can take actions — not just generate text. It perceives inputs, reasons about what to do next, calls tools, observes results, and repeats until the goal is achieved. The key difference from a plain LLM call: the model controls the loop, deciding whether to call a tool, which tool to call, and when to stop.
With a plain LLM call, you ask a question and get an answer once. The control flow is linear: Input → Model → Output. With an agent, control bounces between the model and the environment: Input → Model → Tool → Environment → Observe → Model → (loop back to Tool or Output).
ReAct (Reasoning and Acting) is the most widely used agent pattern. At each step, the LLM produces a reasoning trace (thinking out loud), then commits to an action. After observing the result, it reasons about what happened and plans the next step. This loop continues until the goal is met or a limit is reached.
1. Reason: LLM thinks about the task and what to do next (internal monologue). 2. Act: LLM commits to a tool call with specific inputs. 3. Observe: Tool executes, returns a result. 4. Repeat: LLM reads the result and decides: call another tool, synthesize an answer, or ask for clarification.
Agents are powerful but add latency, complexity, and cost (multiple LLM calls). Plain LLM calls are fast and simple but can't adapt to new information or errors. Choose based on your task structure and constraints.
| Scenario | Use plain LLM call | Use agent |
|---|---|---|
| Single clear task | ✓ Yes | No |
| No external data needed | ✓ Yes | No |
| Strict latency <1s | ✓ Yes | No |
| Low cost priority | ✓ Yes | No |
| Multi-step task | No | ✓ Yes |
| Need external data | No | ✓ Yes |
| Steps unpredictable | No | ✓ Yes |
| Can verify & retry errors | No | ✓ Yes |
Every agent has four essential components. All four must work together for the agent to function effectively. Missing or weak components lead to agents that loop indefinitely, forget context, or make poor decisions.
The language model decides what to do next. It reasons about the current state, selects a tool, and structures the tool inputs. The LLM must be capable enough to understand the task and the available tools. Smaller models may fail at tool selection; larger models (Claude 3, GPT-4) are more reliable.
What matters: Model capability, tool-calling accuracy, reasoning coherence. Fast models like Claude 3 Haiku work for simple agents; complex reasoning requires stronger models.
Tools are functions the agent can call: search_web, query_database, execute_code, send_email, etc. Each tool takes inputs and returns results. The agent only knows about tools you give it; without a search tool, it can't do research. Without a code execution tool, it can't write and run code.
What matters: Tool clarity, correctness, speed. Document what each tool does, what inputs it needs, and what it returns. Buggy tools cause agent errors.
Memory stores the conversation history and intermediate results so the agent can build on previous steps. Without memory, the agent forgets why it started or what it learned. There are two types: short-term (current conversation), long-term (past conversations, facts about the user/domain).
What matters: Relevance, not volume. Too much memory confuses the LLM; too little causes the agent to forget. Use retrieval to fetch only relevant facts, not the entire knowledge base.
The loop is the while-true that keeps calling the LLM, parsing tool calls, executing tools, and feeding results back until a stopping condition is met. The loop needs termination conditions: max iterations, output token limit, explicit "finished" signal, or timeout.
What matters: Termination guarantees. An agent without loop limits can run forever, burning tokens and time. Always set max_iterations (e.g., 10) and monitor for infinite loops.
Here's a complete, runnable agent in Python using the Anthropic SDK. It loops, calls a tool, observes results, and decides when to stop.
1. Initialize: Create tools and define the agent function. 2. Loop: Call the LLM with the current message history and available tools. 3. Check stop reason: If 'end_turn', the LLM is done — extract the final text response and return. 4. Parse tool calls: If the LLM called a tool, extract the tool name and inputs. 5. Execute: Run the tool and collect results. 6. Feed back: Add the assistant's response and tool results to the message history. 7. Loop again: Call LLM again with the updated history until stop_reason is 'end_turn'.
Once you understand the basics, deepen your knowledge in these areas. Each is a complete concept page with code and examples.
How agents decompose complex goals into action sequences. Task decomposition, subgoal setting, and hierarchical planning enable agents to tackle problems that require many steps and decisions.
Short-term context windows, long-term retrieval, and episodic memory. How agents remember facts, learn from past interactions, and retrieve relevant history to inform decisions.
Function calling, API integration, and tool composition. How agents invoke external systems, handle errors, and chain tool calls together.
LangChain, LangGraph, CrewAI, AutoGPT. Production frameworks that handle tool routing, memory management, observability, and multi-agent patterns.
Orchestrator-worker patterns, parallel execution, and consensus. How multiple agents collaborate, delegate tasks, and solve complex problems.
Single agents hit limits on task complexity, context length, and reliability. Multi-agent systems split work across specialised sub-agents, each with focused tools and instructions. The orchestrator delegates; sub-agents execute. This enables parallelism, role specialisation, and independent verification (one agent checks another's work).
Common topologies: Orchestrator → Workers (one coordinator, many specialists), Pipeline (sequential agents in a fixed order), and Debate/Critique (two agents argue a point, a judge decides). LangGraph and CrewAI both implement graph-based multi-agent workflows. For most teams, start with 2 agents max — orchestration complexity grows faster than the benefits.
| Framework | Strengths | Best for |
|---|---|---|
| LangGraph | Stateful graphs, cycles, persistence | Complex workflows with branching |
| CrewAI | Role-based agents, easy setup | Business process automation |
| AutoGen (Microsoft) | Conversational multi-agent | Research, debate, code generation |
| Plain Python | Full control, no framework overhead | Simple 2–3 agent pipelines |