LLM Agent Frameworks

Contents

What makes a framework
Landscape & comparison
LangGraph deep dive
CrewAI multi-role
AutoGen conversational
Decision guide
Observability & tracing

01 — Foundation

What Makes a Framework?

An agent framework gives you tool dispatch, memory management, multi-step planning, and observability — instead of writing it yourself. Rather than stringing together 500 lines of glue code with OpenAI SDK calls, a framework provides the scaffolding.

Four Layers Every Agent Needs

(1) LLM interface layer: abstract the specific model (Claude, GPT-4, Llama). Send structured prompts, parse tool calls, handle errors. (2) Tool registry: define what tools the agent has access to, their schemas, and how to execute them. (3) Memory/state store: persist conversation history, internal state, past decisions. (4) Orchestration loop: handle the agentic loop itself — reason, act, observe, update state, repeat.

Build vs buy question: frameworks accelerate prototyping but add abstraction overhead. Many teams start with a framework, then rewrite the hot path in plain Python when they hit production constraints.

⚠️ Every framework has an "easy path" and a "production path". They are not the same. Read production case studies before committing. The tutorial examples often hide the complexity of scaling, latency debugging, and cost optimization.

02 — Landscape

Framework Comparison

Here's a full comparison of the major frameworks used in production:

Framework	Paradigm	State handling	Multi-agent	Best for
LangChain	DAG / chain	Stateless by default	Via agents	Quick prototypes, chains
LangGraph	Stateful graph	First-class state	Yes (native)	Complex workflows, human-in-loop
CrewAI	Role-based agents	Per-agent memory	Yes (crews)	Collaborative agent teams
AutoGen	Conversational	Conversation history	Yes (group chat)	Research, multi-agent chat
Pydantic AI	Type-safe agents	Structured I/O	Limited	Production, typed pipelines
LlamaIndex	Data-centric	Index + query	Via workflows	RAG-heavy, document QA
Plain Python + SDK	Manual	Whatever you build	Whatever you build	Full control, high scale

03 — Deep Dive

LangGraph

LangGraph models agents as stateful graphs: nodes are actions or decisions, edges are transitions, state is a typed dict persisted across steps. This makes it powerful for workflows that need to pause, inspect, and resume.

Key Concepts

StateGraph: Define your state schema as a TypedDict. Nodes: Python functions that take state and return modified state. Edges: Deterministic or conditional transitions between nodes. Checkpointers: Persist state to Postgres, Redis, memory, or custom backend. Human-in-the-loop: Use interrupt_before and interrupt_after to pause execution for human review.

Minimal LangGraph Agent

from langgraph.graph import StateGraph, END from typing import TypedDict, Annotated import operator class AgentState(TypedDict): messages: Annotated[list, operator.add] tool_results: list def call_llm(state: AgentState) -> AgentState: # call LLM with state['messages'] response = llm.invoke(state['messages']) return {"messages": [response]} def should_continue(state: AgentState) -> str: if has_tool_calls(state['messages'][-1]): return "tools" return END graph = StateGraph(AgentState) graph.add_node("llm", call_llm) graph.add_node("tools", call_tools) graph.add_conditional_edges("llm", should_continue) graph.set_entry_point("llm") app = graph.compile()

✓ LangGraph's persistence layer is critical for production agents. You can pause mid-execution, route to a human reviewer, and resume from exactly where you left off — essential for systems that touch real data or high-stakes decisions.

04 — Deep Dive

CrewAI and Multi-Role Agents

CrewAI models each agent as a role with a goal, backstory, tools, and memory. A Crew orchestrates agents sequentially or hierarchically; a manager agent can delegate to specialized agents. Best for content pipelines, research tasks, and multi-perspective synthesis.

Role-Based Mental Model

Each agent has a clear persona and responsibility. This works exceptionally well when different "experts" are conceptually clean — a researcher agent, an analyst agent, a writer agent. The abstraction maps cleanly to product workflows and makes prompting more intuitive.

3-Agent Research Crew Example

from crewai import Agent, Task, Crew researcher = Agent(role='Senior Researcher', goal='Find latest AI papers', backstory='Expert at finding academic sources', tools=[search_tool]) analyst = Agent(role='Data Analyst', goal='Synthesize findings', backstory='Turns raw research into actionable insights') writer = Agent(role='Technical Writer', goal='Write clear summary', backstory='Makes complex topics accessible') tasks = [Task(description='Research LLM benchmarks', agent=researcher), Task(description='Analyze trends', agent=analyst), Task(description='Write 500-word brief', agent=writer)] crew = Crew(agents=[researcher, analyst, writer], tasks=tasks) result = crew.kickoff()

CrewAI is particularly strong when you need sequential task dependencies and role-based decomposition. The trade-off is less fine-grained control over state compared to LangGraph.

05 — Deep Dive

AutoGen Conversational Agents

AutoGen treats agents as conversational participants: each agent has a system prompt, can send and receive messages, and can be a human-proxy or an LLM-backed assistant. GroupChat enables round-robin or managed multi-agent conversations with code execution loops.

Code Execution Loop

AssistantAgent can generate code; UserProxyAgent executes it locally and returns feedback. This creates a tight loop: LLM generates, execute, report results, LLM refines. Exceptionally effective for code-generation and problem-solving tasks where you can test ideas immediately.

AutoGen Code-Writing Loop

import autogen assistant = autogen.AssistantAgent("assistant", llm_config={"model": "gpt-4o", "api_key": "..."}) user_proxy = autogen.UserProxyAgent("user_proxy", code_execution_config={"work_dir": "coding", "use_docker": False}) user_proxy.initiate_chat(assistant, message="Write and test a Python function to compute Fibonacci numbers")

Comparison: LangGraph vs CrewAI vs AutoGen

Aspect	LangGraph	CrewAI	AutoGen
Mental model	State machine	Role-based crew	Chat participants
State	Typed dict, persistent	Per-agent memory	Conversation history
Control flow	Graph edges	Sequential/hierarchical	Conversation-driven
Human in loop	Yes (interrupt)	Limited	Yes (UserProxyAgent)
Best fit	Production workflows	Content/research tasks	Code generation, research

06 — Decision Guide

Which Framework?

The right choice depends on your task shape, team maturity, and scale constraints:

Starting a prototype — fastest path

Use LangChain for simple chains; LangGraph the moment you need state or branching. LangChain is easiest to onboard with; LangGraph becomes necessary as soon as your workflow requires conditional logic or pause/resume.

Collaboration between specialized agents — role-based

CrewAI. The role-based mental model maps cleanly to product workflows. Researcher, analyst, writer — each with their own backstory and tools — is a natural decomposition for many tasks.

Code-writing or research agents — execution feedback

AutoGen. Conversational loop + code execution is uniquely good here. The agent proposes, you test, it refines — tight feedback loop beats iterative planning for these domains.

High-scale production — full control

Plain Python + async + OpenAI SDK. Frameworks add latency and debugging friction. Write the abstraction you need, not a general one. Most successful production agents eventually do this.

⚠️ Framework churn is real. AutoGen 0.4 broke AutoGen 0.2 API completely. LangChain rewrote its expression language twice. Pin versions, test upgrades in staging, and maintain a fallback plan to rewrite in plain Python if the framework becomes a constraint.

07 — Observability

Tracing and Debugging Agents

Multi-step agents are hard to debug. Which step failed? Which LLM call was slow? What context caused the wrong decision? Tracing becomes critical in production.

What to Trace

Every LLM call: input tokens, output tokens, latency, model used. Every tool call: tool name, arguments, result, execution duration. Every state transition: node entry, node exit, state delta. Errors: exception type, stack, recovery action.

Recommended Tools

LangSmith: Native integration with LangGraph and LangChain. Built-in trace persistence, evaluation framework, dataset management.
Langfuse: Open-source, provider-agnostic. Trace any framework. Good for multi-framework shops.
Arize Phoenix: Open-source observability. Fast for large trace volumes.
W&B Traces: Lightweight integration with Weights & Biases platform.

⚠️ Always log the full state at each node, not just the final output. You cannot debug an agent from its last message alone. State transitions and intermediate results are where bugs live.

Observability Tools Grid

Framework

LangGraph

Native checkpointing and state persistence for debugging

Framework

LangChain

Chain-level tracing and run management

Framework

CrewAI

Task-level logging and agent output tracking

Framework

AutoGen

Conversation history and code execution logs

Tracing

LangSmith

Native LangGraph integration, evaluation, datasets

Tracing

Langfuse

Open-source, provider-agnostic, SDKs for all frameworks

Tracing

Arize Phoenix

Open-source, optimized for large trace volumes

Tracing

W&B Traces

Lightweight integration with Weights & Biases

References

DOCS LangGraph documentation — official guide and API reference
PAPER LangGraph: Agentic Workflows as Code — arxiv 2404.11584
DOCS CrewAI documentation — role-based agent framework
PAPER AutoGen: Enabling Next-Gen LLM Applications — arxiv 2308.08155
DOCS Pydantic AI documentation — type-safe agents

LLM Agent Frameworks

What Makes a Framework?

Four Layers Every Agent Needs

Framework Comparison

LangGraph

Key Concepts

Minimal LangGraph Agent

CrewAI and Multi-Role Agents

Role-Based Mental Model

3-Agent Research Crew Example

AutoGen Conversational Agents

Code Execution Loop

AutoGen Code-Writing Loop

Comparison: LangGraph vs CrewAI vs AutoGen

Which Framework?

Starting a prototype — fastest path

Collaboration between specialized agents — role-based

Code-writing or research agents — execution feedback

High-scale production — full control

Tracing and Debugging Agents

What to Trace

Recommended Tools

Observability Tools Grid

Related concepts

References