Agent Frameworks

AutoGen

Microsoft's framework for building conversational multi-agent applications, where agents communicate via natural language messages in group chats or two-agent conversations.

Conversation-based
Agent comms
Human-in-loop
Support
Code execution
Built-in

Table of Contents

SECTION 01

AutoGen's conversation model

Most frameworks wire agents together with function calls and shared state. AutoGen takes a different approach: agents communicate entirely through natural language messages, like a group chat. Each agent receives messages, decides whether to respond or pass, and sends natural language back.

This makes AutoGen feel more like coordination between humans: the UserProxy (representing you) sends a task, the AssistantAgent responds with a plan or code, you (or the proxy) provide feedback, and the cycle continues until the task is complete. The conversation history is the shared state.

SECTION 02

Two-agent setup

pip install pyautogen
import autogen

# Configuration for Anthropic models
config_list = [
    {
        "model": "claude-3-5-sonnet-20241022",
        "api_key": "your-anthropic-api-key",
        "api_type": "anthropic",
    }
]

llm_config = {"config_list": config_list, "temperature": 0}

# AssistantAgent: the AI that does the work
assistant = autogen.AssistantAgent(
    name="Assistant",
    llm_config=llm_config,
    system_message="You are a helpful AI assistant. When writing code, make it runnable."
)

# UserProxyAgent: represents the user, can execute code
user_proxy = autogen.UserProxyAgent(
    name="User",
    human_input_mode="NEVER",       # "NEVER", "TERMINATE", or "ALWAYS"
    max_consecutive_auto_reply=5,   # auto-reply up to 5 times
    is_termination_msg=lambda x: "TERMINATE" in x.get("content", ""),
    code_execution_config={
        "work_dir": "./workspace",
        "use_docker": False,        # set True for isolated execution
    }
)

# Start the conversation
user_proxy.initiate_chat(
    assistant,
    message="Write a Python function that finds all prime numbers up to N using the Sieve of Eratosthenes, then test it with N=50."
)
SECTION 03

Group chat with multiple agents

import autogen

config_list = [{"model": "claude-3-5-sonnet-20241022", "api_key": "...", "api_type": "anthropic"}]
llm_cfg = {"config_list": config_list}

# Define specialist agents
coder = autogen.AssistantAgent(
    name="Coder",
    system_message="You write clean Python code. After writing, say LGTM when done.",
    llm_config=llm_cfg,
)
reviewer = autogen.AssistantAgent(
    name="Reviewer",
    system_message="You review code for bugs, edge cases, and style. Be critical. Say LGTM when approved.",
    llm_config=llm_cfg,
)
executor = autogen.UserProxyAgent(
    name="Executor",
    human_input_mode="NEVER",
    code_execution_config={"work_dir": ".", "use_docker": False},
    is_termination_msg=lambda x: "LGTM" in x.get("content", "") and "Reviewer" in x.get("name", "")
)

# Group chat — agents take turns responding
groupchat = autogen.GroupChat(
    agents=[executor, coder, reviewer],
    messages=[],
    max_round=10,
    speaker_selection_method="auto"   # LLM decides who speaks next
)
manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_cfg)

executor.initiate_chat(manager, message="Write and test a function that checks if a string is a palindrome.")
SECTION 04

Code execution agent

AutoGen's killer feature: the UserProxyAgent can automatically execute code that the AssistantAgent writes, return the output, and continue the conversation. This enables a full code-write-run-debug loop:

user_proxy = autogen.UserProxyAgent(
    name="User",
    human_input_mode="NEVER",
    code_execution_config={
        "work_dir": "./workspace",     # where code files are saved
        "use_docker": True,            # RECOMMENDED: Docker for safe execution
        "timeout": 60,                 # kill runaway processes after 60s
        "last_n_messages": 3,          # only extract code from last 3 messages
    },
    is_termination_msg=lambda msg: msg.get("content") and "TERMINATE" in msg["content"]
)
# The assistant writes code → UserProxy executes it → returns stdout/stderr
# → assistant sees the output and fixes bugs → repeat

Always use Docker in production. use_docker=False executes code directly on your machine. A hallucinated os.remove call could delete files. Docker provides sandboxed, isolated execution.

SECTION 05

Human-in-the-loop

user_proxy = autogen.UserProxyAgent(
    name="User",
    human_input_mode="TERMINATE",   # ask human when agent wants to terminate
    # Options:
    # "NEVER"    — fully autonomous, never asks for human input
    # "TERMINATE" — asks human only when agent sends termination message
    # "ALWAYS"   — asks human for input at every turn
    max_consecutive_auto_reply=3,   # auto-reply 3 times, then ask human
)

# With "TERMINATE" mode: the agent works autonomously until it says TERMINATE,
# then prompts you to review and either provide feedback or accept the result.

TERMINATE mode is the sweet spot for production: fully autonomous for routine steps, but pauses for human review before finalising important outputs.

SECTION 06

AutoGen with Anthropic

import autogen

# Anthropic configuration
config_list_anthropic = [
    {
        "model": "claude-3-5-sonnet-20241022",
        "api_key": "your-anthropic-api-key",
        "api_type": "anthropic",
        "max_tokens": 4096
    }
]

# Use Haiku for less critical agents
config_list_haiku = [
    {
        "model": "claude-3-5-haiku-20241022",
        "api_key": "your-anthropic-api-key",
        "api_type": "anthropic",
    }
]

# Mix models: expensive model for planner, cheap for executors
planner = autogen.AssistantAgent(
    name="Planner",
    llm_config={"config_list": config_list_anthropic, "temperature": 0},
    system_message="You decompose tasks into clear subtasks."
)
worker = autogen.AssistantAgent(
    name="Worker",
    llm_config={"config_list": config_list_haiku, "temperature": 0},
    system_message="You execute tasks precisely as specified."
)
SECTION 07

Gotchas

max_consecutive_auto_reply is critical. Without it, two agents can loop indefinitely (each responding to the other). Always set a limit. The is_termination_msg function is the graceful exit; max_consecutive_auto_reply is the emergency stop.

Group chat speaker selection can be unpredictable. With speaker_selection_method="auto", an LLM decides who speaks next. This can produce unexpected turn orders. For deterministic pipelines, use "round_robin" or a custom selection function.

Conversation history grows with every message. In long group chats, the accumulated conversation becomes very large and expensive. Configure last_n_messages in code execution config to limit what the agent sees, or periodically summarise the conversation.

Code execution is off by default in newer versions. AutoGen 0.4+ changed defaults. If code isn't executing, check your code_execution_config and ensure the work directory exists and has write permissions.

AutoGen Agent Conversation Patterns

AutoGen enables multi-agent conversations where LLM-powered agents collaborate to complete complex tasks through structured dialogue. Unlike single-agent frameworks, AutoGen models task completion as an emergent property of agents with complementary roles communicating through a shared conversation context.

PatternAgents InvolvedTerminationBest For
Two-agent chatAssistant + UserProxyHuman or task completeCode generation + execution
Group chatManager + N specialistsManager decidesComplex multi-step tasks
Sequential pipelineA → B → CLast agent finishesProcessing pipelines
HierarchicalOrchestrator + subteamsOrchestrator decidesLarge decomposable tasks

The UserProxy agent in AutoGen acts as the human interface in agent conversations, executing code generated by the assistant agent in a local code executor and feeding results back into the conversation. This code execution feedback loop enables agents to iteratively refine their solutions based on actual runtime outputs rather than simulated reasoning — a key capability for tasks requiring precise computation, data analysis, or automated testing. The UserProxy's human_input_mode parameter controls whether a real human can interject during the conversation or whether it proceeds fully autonomously.

Group chat orchestration in AutoGen uses a GroupChatManager that selects the next speaker after each message, either through a round-robin schedule, a custom speaker selection function, or by asking an LLM to choose the most appropriate next contributor. Speaker selection quality significantly affects overall task completion quality — a well-designed selection strategy routes messages to the specialist best positioned to advance the conversation, while a naive round-robin can waste turns having irrelevant agents respond when only one specialist's input is needed.

AutoGen's code execution environment supports Docker sandboxing as a security boundary around agent-generated code. Without sandboxing, code executed by the UserProxy runs with the full permissions of the host process — a significant risk when agents generate code to handle untrusted data or call external APIs. Docker execution isolates the generated code in a container with no network access by default, protecting the host system from malicious or accidental side effects. The sandbox can be configured with specific resource limits, network policies, and allowed Python packages to match the security requirements of the deployment environment.

AutoGen Studio provides a visual interface for building and testing multi-agent workflows without writing Python code. Agents, their system prompts, and their tool sets are configured through a web UI, and conversations can be tested interactively before deployment. The visual interface lowers the barrier to experimenting with multi-agent patterns for users unfamiliar with the AutoGen Python API, and the exported workflow configurations can be used directly with the Python SDK for production deployment — maintaining a clear path from prototyping to production.

Nested agents in AutoGen enable recursive task decomposition, where an agent created to solve a sub-task can itself spawn sub-agents for further decomposition. This recursive nesting is powerful for tasks with deep hierarchical structure but requires careful termination condition design — without explicit convergence criteria, nested agent conversations can proliferate indefinitely. Practical implementations typically set maximum nesting depth limits and budget constraints on the total number of LLM calls that can be triggered by a single top-level request.

Message history management in AutoGen multi-agent conversations requires attention to context window limits, especially in long-running group chats. Configuring a maximum_consecutive_auto_reply limit prevents infinite agent loops, and using a SummaryMethod that compresses the conversation history when it approaches the context window limit prevents token budget overruns on long tasks. Periodically clearing the conversation and reinitializing with a summary of progress is the recommended pattern for tasks expected to span hundreds of turns.