SmolAgents

The code-agent philosophy
How SmolAgents works
Building your first SmolAgent
Custom tools
Connecting to HuggingFace models
CodeAgent vs ToolCallingAgent
Gotchas

SECTION 01

The code-agent philosophy

Most agent frameworks communicate between the LLM and tools via JSON: the model outputs {"tool": "search", "args": {"query": "..."}}, your code parses it, runs the tool, and feeds the result back. It works, but it's verbose — every tool call requires parsing, schema enforcement, and error handling for malformed JSON.

SmolAgents takes a different approach: the model writes Python code to call tools. Instead of a JSON dispatch object, the model outputs:

result = search(query="current AAPL stock price")
final_answer(f"Apple is trading at {result}")

This is more compact (fewer tokens per action), more composable (tools can be chained with native Python logic), and more readable for debugging. The trade-off is that the model must understand Python — which modern LLMs do very well.

SmolAgents is also small: the core library is around 1,000 lines of code. This makes it auditable, forkable, and fast to understand.

SECTION 02

How SmolAgents works

The execution loop is straightforward:

1. User prompt → agent
2. Agent calls LLM with: task + available tools + history
3. LLM returns Python code (in a ```python ... ``` block)
4. SmolAgents executes the code in a local interpreter
5. Captured output is appended to history as "Observation"
6. Loop repeats until final_answer() is called or max_steps reached

Tools are Python functions decorated with @tool. The docstring becomes the tool description in the prompt. The function signature defines arguments and types.

SmolAgents supports two agent types: CodeAgent (generates and runs Python code — more powerful, requires a safe execution environment) and ToolCallingAgent (classic JSON tool dispatch — more constrained but safer for production).

SECTION 03

Building your first SmolAgent

pip install smolagents

from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel

# Use a hosted model via HuggingFace Inference API
model = HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct")

# DuckDuckGoSearchTool is a built-in tool
agent = CodeAgent(
    tools=[DuckDuckGoSearchTool()],
    model=model,
)

result = agent.run(
    "What are the top 3 most-starred Python repos on GitHub today?"
)
print(result)

The agent will write code like results = search("top starred Python repos GitHub 2024"), run it, inspect the results, and call final_answer(...) when it has enough information.

You can also use OpenAI or Anthropic models by swapping the model class:

from smolagents import LiteLLMModel

model = LiteLLMModel("anthropic/claude-sonnet-4-5")
agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=model)

SECTION 04

Custom tools

from smolagents import tool

@tool
def get_weather(city: str) -> str:
    '''Get the current weather for a city.

    Args:
        city: The city name (e.g. "London", "Tokyo").
    Returns:
        A string describing current weather conditions.
    '''
    # Replace with a real weather API call
    import random
    conditions = ["sunny", "cloudy", "rainy", "snowy"]
    temp = random.randint(-5, 35)
    return f"{city}: {random.choice(conditions)}, {temp}°C"

@tool
def calculate(expression: str) -> float:
    '''Evaluate a mathematical expression safely.

    Args:
        expression: A valid Python math expression (e.g. "2 ** 10 + 3.14")
    Returns:
        The numeric result.
    '''
    import ast, operator
    # Safe eval using AST
    ops = {ast.Add: operator.add, ast.Sub: operator.sub,
           ast.Mult: operator.mul, ast.Div: operator.truediv,
           ast.Pow: operator.pow}
    def eval_(node):
        if isinstance(node, ast.Num): return node.n
        if isinstance(node, ast.BinOp):
            return ops[type(node.op)](eval_(node.left), eval_(node.right))
        raise ValueError("Unsupported")
    return eval_(ast.parse(expression, mode='eval').body)

agent = CodeAgent(
    tools=[get_weather, calculate],
    model=model,
    max_steps=10,
)
result = agent.run("What's the temperature difference between London and Tokyo?")

SECTION 05

Connecting to HuggingFace models

SmolAgents integrates natively with the HuggingFace ecosystem. You can run models locally with TransformersModel or use the Inference API with HfApiModel:

from smolagents import TransformersModel

# Run fully locally (requires GPU for large models)
model = TransformersModel(
    model_id="Qwen/Qwen2.5-Coder-7B-Instruct",
    device_map="auto",
    torch_dtype="auto",
)
agent = CodeAgent(tools=[...], model=model)

For the Inference API (no local GPU needed):

import os
from smolagents import HfApiModel

model = HfApiModel(
    model_id="Qwen/Qwen2.5-72B-Instruct",
    token=os.environ["HF_TOKEN"],
)

The best models for SmolAgents tasks (as of 2024) are Qwen2.5-Coder series for code generation tasks and GPT-4o or Claude for general-purpose agents where reasoning depth matters more than speed.

SECTION 06

CodeAgent vs ToolCallingAgent

CodeAgent writes Python code and executes it. The code can use loops, conditionals, variable assignment, and chain multiple tool calls in a single step. This is powerful for complex tasks but requires a safe code execution environment — the code runs on your machine.

ToolCallingAgent uses classic JSON tool dispatch (similar to LangChain agents or Anthropic tool use). It's more constrained but doesn't require a code interpreter, making it safer for production deployments where you can't run arbitrary code.

from smolagents import ToolCallingAgent

# JSON dispatch mode — safer for production
agent = ToolCallingAgent(
    tools=[get_weather, calculate],
    model=model,
)

Use CodeAgent when: tasks require complex logic, chaining tools with conditions, or you need the model to do data processing. Use ToolCallingAgent when: you're in a security-sensitive environment, tools have side effects (DB writes, API calls), or you need strict input validation before execution.

SECTION 07

Gotchas

Code execution is not sandboxed by default. CodeAgent runs code in your local Python interpreter. A model generating import os; os.system("rm -rf /") would be catastrophic. Always run SmolAgents in a Docker container or use E2BExecutor for cloud sandboxing in production:

from smolagents import CodeAgent, E2BExecutor

agent = CodeAgent(
    tools=[...],
    model=model,
    executor_type=E2BExecutor,  # runs in isolated cloud sandbox
)

Tool docstrings are your prompt. The model only knows what a tool does from its docstring. Vague docstrings lead to wrong tool selection. Be specific about what the tool returns, when to use it, and what its limitations are.

Max steps can be hit on complex tasks. Default is 6 steps. For tasks requiring many tool calls, increase max_steps carefully — each step costs LLM tokens. Add a step_callback to log or monitor progress.

SmolAgents Architecture Comparison

SmolAgents from HuggingFace provides a minimal, lightweight framework for building LLM agents that prioritizes simplicity and transparency over feature richness. Its core design philosophy is that agents should be understandable codebases that developers can read, modify, and debug rather than black-box frameworks that hide complexity behind abstractions.

Agent Type	Action Format	Strengths	Limitations
CodeAgent	Python code snippets	Flexible, composable	Requires sandbox
ToolCallingAgent	JSON tool calls	Structured, safe	Less flexible
MultiStepAgent	Thought + action pairs	Transparent reasoning	More tokens

The CodeAgent in SmolAgents allows the LLM to write Python code as its action format rather than calling predefined tools via JSON. This provides much greater flexibility — the agent can compose tool calls, apply transformations, use control flow, and handle complex data processing all within a single code block. The generated code is executed in a controlled Python interpreter with access to a predefined set of allowed imports, balancing capability with safety.

SmolAgents integrates natively with HuggingFace Hub for tool sharing, allowing developers to publish reusable tool definitions that other agents can import with a single line. The Hub-hosted tools follow a standardized interface with typed inputs and outputs, making it straightforward to compose third-party tools into custom pipelines without writing adapter code. This ecosystem approach to tool sharing is one of SmolAgents' key differentiators from more self-contained frameworks.

SmolAgents' multi-agent support allows one agent to delegate sub-tasks to other specialized agents. A manager agent breaks down a complex task, routes sub-tasks to domain-specialist agents (a coding agent, a research agent, a data analysis agent), and synthesizes their outputs into a final response. The manager communicates with sub-agents via a simple interface that looks like a tool call, keeping the architectural complexity manageable while enabling powerful hierarchical task decomposition.

Observability in SmolAgents pipelines is built around the reasoning trace — the sequence of thoughts, tool calls, and observations the agent generates during task execution. Logging the full trace to a structured store (LangSmith, Weights & Biases, or a custom database) enables post-hoc analysis of failure modes, step counts, and tool usage patterns. Comparing traces of successful and failed runs on the same task type reveals the decision points where the agent diverges, guiding targeted improvements to the system prompt or tool definitions.

The SmolAgents framework is intentionally small — the core codebase is under 1,000 lines — making it easy to read and understand completely before using it in production. This transparency is a deliberate design choice that contrasts with larger frameworks where critical behavior is buried in nested abstractions. Teams that need to debug agent behavior, customize planning logic, or integrate with non-standard infrastructure benefit significantly from a codebase where every execution path can be traced directly to readable source code.

Error recovery patterns in SmolAgents are handled by giving the agent visibility into tool call failures through the observation mechanism. When a tool call raises an exception, the error message becomes the observation for that step, and the agent can reason about whether to retry with different parameters, try an alternative tool, or escalate to the user. Agents that handle errors gracefully tend to have explicit instructions in the system prompt about preferred recovery strategies for common failure modes like network timeouts, rate limits, and empty search results.