AUTONOMOUS AGENTS

AI Agents

LLMs that take actions — perceive, reason, act, observe, repeat

reason → act → observe the loop
LLM + tools + memory the components
agent vs pipeline the key decision
Contents
  1. What makes something an agent
  2. The ReAct loop
  3. When to use agents vs plain calls
  4. The 4 components
  5. Minimal code example
  6. What to explore next
  7. References
01 — Foundation

What Makes Something an Agent

An agent is an LLM that can take actions — not just generate text. It perceives inputs, reasons about what to do next, calls tools, observes results, and repeats until the goal is achieved. The key difference from a plain LLM call: the model controls the loop, deciding whether to call a tool, which tool to call, and when to stop.

With a plain LLM call, you ask a question and get an answer once. The control flow is linear: Input → Model → Output. With an agent, control bounces between the model and the environment: Input → Model → Tool → Environment → Observe → Model → (loop back to Tool or Output).

Plain LLM vs Agent

PLAIN LLM CALL (one shot): Input → LLM → Output (done, no loop) AGENT (loop until done): Input → LLM (decide) → Tool call → Result → LLM (observe, decide next) → (loop or stop) (LLM controls the flow)
💡 Key insight: Agents trade latency and complexity for autonomy. The model becomes an active decision-maker instead of a passive responder. This is powerful for multi-step tasks, exploration, and error recovery — but it's not always necessary.
02 — Control Flow

The ReAct Loop

ReAct (Reasoning and Acting) is the most widely used agent pattern. At each step, the LLM produces a reasoning trace (thinking out loud), then commits to an action. After observing the result, it reasons about what happened and plans the next step. This loop continues until the goal is met or a limit is reached.

ReAct Cycle

1. Reason: LLM thinks about the task and what to do next (internal monologue). 2. Act: LLM commits to a tool call with specific inputs. 3. Observe: Tool executes, returns a result. 4. Repeat: LLM reads the result and decides: call another tool, synthesize an answer, or ask for clarification.

Task: "What is the CEO of OpenAI's birthday?" REASON: I need to find out who the CEO of OpenAI is, then look up their birthday. ACT: search_web("OpenAI CEO") OBSERVE: [Result: Sam Altman is the CEO of OpenAI] REASON: Now I know Sam Altman is the CEO. I need to find his birthday. ACT: search_web("Sam Altman birthday") OBSERVE: [Result: Sam Altman was born on April 22, 1985] REASON: I have the answer. OUTPUT: Sam Altman, CEO of OpenAI, was born on April 22, 1985.
Why ReAct works: The reasoning step makes agent behavior interpretable — you can see why it chose each tool. This is essential for debugging, monitoring, and building user trust. It also helps the LLM avoid mistakes by "thinking before acting."
03 — Decision

When to Use Agents vs Plain LLM Calls

Agents are powerful but add latency, complexity, and cost (multiple LLM calls). Plain LLM calls are fast and simple but can't adapt to new information or errors. Choose based on your task structure and constraints.

Comparison

ScenarioUse plain LLM callUse agent
Single clear task✓ YesNo
No external data needed✓ YesNo
Strict latency <1s✓ YesNo
Low cost priority✓ YesNo
Multi-step taskNo✓ Yes
Need external dataNo✓ Yes
Steps unpredictableNo✓ Yes
Can verify & retry errorsNo✓ Yes
⚠️ Agent latency compound: Multiple tool calls, LLM reasoning, and waiting for results stack up. A 3-step agent needs 3 LLM calls + network delays. For user-facing features, set max_iterations and timeout limits to avoid unpredictable wait times.
04 — Architecture

The 4 Components

Every agent has four essential components. All four must work together for the agent to function effectively. Missing or weak components lead to agents that loop indefinitely, forget context, or make poor decisions.

1. LLM — The Decision Maker

The language model decides what to do next. It reasons about the current state, selects a tool, and structures the tool inputs. The LLM must be capable enough to understand the task and the available tools. Smaller models may fail at tool selection; larger models (Claude 3, GPT-4) are more reliable.

What matters: Model capability, tool-calling accuracy, reasoning coherence. Fast models like Claude 3 Haiku work for simple agents; complex reasoning requires stronger models.

2. Tools — The Environment

Tools are functions the agent can call: search_web, query_database, execute_code, send_email, etc. Each tool takes inputs and returns results. The agent only knows about tools you give it; without a search tool, it can't do research. Without a code execution tool, it can't write and run code.

What matters: Tool clarity, correctness, speed. Document what each tool does, what inputs it needs, and what it returns. Buggy tools cause agent errors.

3. Memory — The Context

Memory stores the conversation history and intermediate results so the agent can build on previous steps. Without memory, the agent forgets why it started or what it learned. There are two types: short-term (current conversation), long-term (past conversations, facts about the user/domain).

What matters: Relevance, not volume. Too much memory confuses the LLM; too little causes the agent to forget. Use retrieval to fetch only relevant facts, not the entire knowledge base.

4. Loop — The Control Flow

The loop is the while-true that keeps calling the LLM, parsing tool calls, executing tools, and feeding results back until a stopping condition is met. The loop needs termination conditions: max iterations, output token limit, explicit "finished" signal, or timeout.

What matters: Termination guarantees. An agent without loop limits can run forever, burning tokens and time. Always set max_iterations (e.g., 10) and monitor for infinite loops.

⚠️ Weak components cause common failures: Weak LLM → misses tools or reasons poorly. Weak tools → returns wrong results. Weak memory → forgets context. No loop limits → hangs.
05 — Implementation

Minimal Working Code Example

Here's a complete, runnable agent in Python using the Anthropic SDK. It loops, calls a tool, observes results, and decides when to stop.

from anthropic import Anthropic client = Anthropic() tools = [{ 'name': 'search_docs', 'description': 'Search company documentation', 'input_schema': { 'type': 'object', 'properties': {'query': {'type': 'string', 'description': 'Search query'}}, 'required': ['query'] } }] def run_tool(name: str, inputs: dict) -> str: if name == 'search_docs': return f'[Search results for "{inputs["query"]}"] ...relevant content...' return 'Unknown tool' def agent(user_message: str) -> str: messages = [{'role': 'user', 'content': user_message}] while True: resp = client.messages.create( model='claude-opus-4-5', max_tokens=1024, tools=tools, messages=messages ) if resp.stop_reason == 'end_turn': return next(b.text for b in resp.content if hasattr(b, 'text')) tool_results = [] for block in resp.content: if block.type == 'tool_use': result = run_tool(block.name, block.input) tool_results.append({ 'type': 'tool_result', 'tool_use_id': block.id, 'content': result }) messages += [ {'role': 'assistant', 'content': resp.content}, {'role': 'user', 'content': tool_results} ] print(agent('What does our documentation say about authentication?'))

How it works

1. Initialize: Create tools and define the agent function. 2. Loop: Call the LLM with the current message history and available tools. 3. Check stop reason: If 'end_turn', the LLM is done — extract the final text response and return. 4. Parse tool calls: If the LLM called a tool, extract the tool name and inputs. 5. Execute: Run the tool and collect results. 6. Feed back: Add the assistant's response and tool results to the message history. 7. Loop again: Call LLM again with the updated history until stop_reason is 'end_turn'.

This is production-ready pseudocode: Frameworks like LangGraph and CrewAI build on this pattern, adding features like human-in-the-loop, memory management, and multi-agent coordination. Start here, then move to frameworks as you add complexity.
06 — Growth

What to Explore Next

Once you understand the basics, deepen your knowledge in these areas. Each is a complete concept page with code and examples.

Core Agent Topics

1

Agent Planning — Breaking down goals

How agents decompose complex goals into action sequences. Task decomposition, subgoal setting, and hierarchical planning enable agents to tackle problems that require many steps and decisions.

2

Agent Memory — Retaining context

Short-term context windows, long-term retrieval, and episodic memory. How agents remember facts, learn from past interactions, and retrieve relevant history to inform decisions.

3

Tool Use — Calling APIs and functions

Function calling, API integration, and tool composition. How agents invoke external systems, handle errors, and chain tool calls together.

4

Agent Frameworks — Building at scale

LangChain, LangGraph, CrewAI, AutoGPT. Production frameworks that handle tool routing, memory management, observability, and multi-agent patterns.

5

Multi-Agent Coordination — Working together

Orchestrator-worker patterns, parallel execution, and consensus. How multiple agents collaborate, delegate tasks, and solve complex problems.

💡 Learning path: Start with agent planning, then tool use, then memory. Frameworks and multi-agent patterns come last, once you understand the fundamentals.

Design Tips

⚠️ Start simple, add complexity carefully: Begin with a single agent and one or two tools. Add memory, parallelism, and multi-agent coordination only once the simpler version provably fails — agent complexity compounds quickly. A single agent with good tools often outperforms a complex multi-agent system with weak tools.
07 — Further Reading

References

Academic Papers
  • Paper Yao, S. et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629. — arxiv:2210.03629 ↗
  • Paper Wang, L. et al. (2024). A Survey on Large Language Model-based Autonomous Agents. arXiv:2308.11432. — arxiv:2308.11432 ↗
From Anthropic
Documentation & Frameworks
Expert Resources
8

Multi-Agent Systems

Single agents hit limits on task complexity, context length, and reliability. Multi-agent systems split work across specialised sub-agents, each with focused tools and instructions. The orchestrator delegates; sub-agents execute. This enables parallelism, role specialisation, and independent verification (one agent checks another's work).

Common topologies: Orchestrator → Workers (one coordinator, many specialists), Pipeline (sequential agents in a fixed order), and Debate/Critique (two agents argue a point, a judge decides). LangGraph and CrewAI both implement graph-based multi-agent workflows. For most teams, start with 2 agents max — orchestration complexity grows faster than the benefits.

# Minimal 2-agent system: researcher + writer import openai client = openai.OpenAI() def run_agent(system: str, task: str, model="gpt-4o-mini") -> str: return client.chat.completions.create( model=model, messages=[{"role":"system","content":system}, {"role":"user","content":task}] ).choices[0].message.content # Agent 1: Researcher — gathers and structures facts research = run_agent( system="You are a research agent. Extract key facts, statistics, and concepts. " "Output as structured bullet points.", task="Research topic: 'How does PagedAttention improve LLM serving throughput?'" ) # Agent 2: Writer — turns research into polished content article = run_agent( system="You are a technical writer. Turn research notes into a clear 150-word explanation " "for a software engineer audience. No jargon without definition.", task=f"Write an explanation based on this research: {research}" ) print(article) # Agent 3 (optional): Critic — validates quality before delivery critique = run_agent( system="You are a critical reviewer. Identify any inaccuracies, gaps, or unclear statements. " "Be concise. If the content is good, just say 'APPROVED'.", task=f"Review this explanation: {article}" ) print(f" Critique: {critique}")
FrameworkStrengthsBest for
LangGraphStateful graphs, cycles, persistenceComplex workflows with branching
CrewAIRole-based agents, easy setupBusiness process automation
AutoGen (Microsoft)Conversational multi-agentResearch, debate, code generation
Plain PythonFull control, no framework overheadSimple 2–3 agent pipelines