Tool Use

Agent Skills

Modular instruction + tool bundles loaded dynamically at runtime — extend agent capabilities without retraining. A skill packages a prompt fragment, a set of tools, and optionally example interactions into a reusable unit.

Dynamic
Loading
Reusable
Modules
No retraining
Required

Table of Contents

SECTION 01

What are agent skills

An agent's capabilities are typically defined at build time: you write the system prompt, define the tools, and deploy. To add new capabilities, you update code and redeploy.

Agent skills flip this model: a skill is a self-contained, loadable bundle of prompt instructions + tools that can be added to an agent at runtime without code changes. The agent discovers what skills are available, loads the ones relevant to the current task, and gains those capabilities on-demand.

This is analogous to browser extensions, VS Code plugins, or LLM MCPs — a plugin architecture for agents. Examples: a "code review" skill adds a set of code analysis tools and a reviewing persona; a "customer support" skill adds tools to query order history and a support-tone system prompt; a "research" skill adds web search and citation tools.

SECTION 02

Anatomy of a skill

from dataclasses import dataclass

@dataclass
class AgentSkill:
    name: str                    # "code_review"
    description: str             # Used by agent to decide when to load
    system_prompt_fragment: str  # Appended to base system prompt
    tools: list[dict]            # Tool definitions (Anthropic format)
    examples: list[dict]         # Few-shot examples (optional)
    version: str = "1.0"
    tags: list[str] = None       # ["coding", "quality"]

# Example: a code review skill
code_review_skill = AgentSkill(
    name="code_review",
    description="Review Python code for bugs, style, and security issues.",
    system_prompt_fragment='''When reviewing code:
- Check for security vulnerabilities (injection, auth, secrets in code)
- Verify error handling covers all edge cases
- Assess readability and adherence to PEP 8
- Look for performance issues (N+1 queries, unnecessary loops)
Always provide specific line references and improvement suggestions.''',
    tools=[
        {
            "name": "run_linter",
            "description": "Run pylint/ruff on a code snippet.",
            "input_schema": {"type": "object", "properties": {"code": {"type": "string"}}, "required": ["code"]}
        },
        {
            "name": "check_security",
            "description": "Check for common security issues using bandit.",
            "input_schema": {"type": "object", "properties": {"code": {"type": "string"}}, "required": ["code"]}
        },
    ],
    tags=["coding", "quality"]
)
SECTION 03

Building a skill registry

import json
from pathlib import Path

class SkillRegistry:
    def __init__(self, skills_dir: str):
        self._skills: dict[str, AgentSkill] = {}
        self._load_from_dir(skills_dir)

    def _load_from_dir(self, directory: str):
        '''Load skills from JSON files in a directory.'''
        for path in Path(directory).glob("*.skill.json"):
            data = json.loads(path.read_text())
            skill = AgentSkill(**data)
            self._skills[skill.name] = skill
            print(f"Loaded skill: {skill.name} v{skill.version}")

    def register(self, skill: AgentSkill):
        self._skills[skill.name] = skill

    def get(self, name: str) -> AgentSkill | None:
        return self._skills.get(name)

    def find_by_task(self, task_description: str) -> list[AgentSkill]:
        '''Use embedding similarity to find relevant skills.'''
        from sentence_transformers import SentenceTransformer
        import numpy as np
        model = SentenceTransformer("all-MiniLM-L6-v2")
        task_emb = model.encode([task_description])
        skill_descs = [s.description for s in self._skills.values()]
        skill_embs = model.encode(skill_descs)
        sims = (skill_embs @ task_emb.T).flatten()
        sorted_skills = sorted(
            zip(self._skills.values(), sims),
            key=lambda x: x[1], reverse=True
        )
        return [skill for skill, sim in sorted_skills if sim > 0.5][:3]

registry = SkillRegistry("./skills/")
registry.register(code_review_skill)
SECTION 04

Dynamic skill loading

import anthropic

client = anthropic.Anthropic()
BASE_SYSTEM = "You are a helpful assistant."

def build_agent_context(task: str, registry: SkillRegistry) -> tuple[str, list[dict]]:
    '''Load relevant skills and build system prompt + tools.'''
    relevant_skills = registry.find_by_task(task)
    if not relevant_skills:
        return BASE_SYSTEM, []

    # Compose system prompt
    system_parts = [BASE_SYSTEM]
    all_tools = []
    for skill in relevant_skills:
        system_parts.append(f"
## {skill.name.upper()} SKILL
{skill.system_prompt_fragment}")
        all_tools.extend(skill.tools)

    return "
".join(system_parts), all_tools

def skilled_agent(user_query: str) -> str:
    system, tools = build_agent_context(user_query, registry)
    kwargs = {"model": "claude-sonnet-4-5", "max_tokens": 1024,
              "system": system,
              "messages": [{"role": "user", "content": user_query}]}
    if tools:
        kwargs["tools"] = tools

    response = client.messages.create(**kwargs)
    return response.content[0].text

# Skills are loaded automatically based on the task
print(skilled_agent("Please review this Python function for security issues: def get_user(id): return db.query(f'SELECT * FROM users WHERE id={id}')"))
# → code_review skill loaded automatically
SECTION 05

Skill composition

Skills can be composed: a "research report" workflow might load the "web research" skill + "writing" skill + "citation" skill simultaneously. The agent gets the combined tools and prompt fragments from all three.

def compose_skills(skill_names: list[str], registry: SkillRegistry) -> tuple[str, list[dict]]:
    '''Explicitly compose named skills.'''
    parts = [BASE_SYSTEM]
    tools = []
    for name in skill_names:
        skill = registry.get(name)
        if skill:
            parts.append(f"
## {name.upper()}
{skill.system_prompt_fragment}")
            tools.extend(skill.tools)
    return "
".join(parts), tools

# For a complex research task, compose multiple skills
system, tools = compose_skills(
    ["web_research", "writing", "citation_formatting"],
    registry
)

Skill composition requires careful prompt design to avoid conflicts — if two skills both give instructions about response format, they'll fight. Use clear section headers in the system prompt and test composed skill sets end-to-end before deploying.

SECTION 06

MCP plugins as skills

MCP servers are the natural evolution of agent skills: they expose tools, resources, and prompts via a standard protocol. An MCP marketplace (like a skills registry) where agents can discover and load capabilities at runtime is the production-grade implementation of the skills pattern.

In Cowork mode (this application), skills are implemented exactly this way: each skill is a directory with a SKILL.md prompt file and optional tools, loaded dynamically when the agent determines it's relevant to the task. The skill registry is the /skills directory, and skill discovery is based on task description matching.

For your own agents, you can implement skills as: JSON files (simple), Python modules loaded via importlib (dynamic code), or MCP servers (most portable). MCP servers give you the best separation of concerns — the skill's tools are truly independent from the agent runtime.

SECTION 07

Gotchas

Prompt bloat degrades quality. Loading 5 skills each with 200 tokens of system prompt instructions adds 1,000 tokens to every request. Beyond about 3-4 loaded skills, the model can start to lose track of earlier instructions. Be selective: load only the skills directly relevant to the current task.

Tool name collisions break agents. If two skills both define a tool called "search", the model gets confused. Namespace skill tools: "code_review__lint" instead of "lint". Or use a skill prefix convention enforced by the registry.

Skills need versioning. When you improve a skill's prompts or add new tools, you want to roll out the change gradually — not break all running agents simultaneously. Store skill version in the registry and let agents pin to a major version ("web_research@1").

Agent Skill Design Patterns

Agent skills are modular capability units that encapsulate a specific ability — web search, code execution, API calls, file manipulation — behind a consistent interface. Designing skills well determines whether an agent can reliably compose them to solve complex tasks or gets stuck on ambiguous tool selection and incorrect parameter construction.

Skill TypeInterface PatternError HandlingExample
Retrievalquery → documentsReturn empty list on missweb_search, vector_lookup
Actionparams → statusRaise on failure, return receiptsend_email, create_file
Transforminput → outputValidate schema in/outparse_json, summarize
Compoundgoal → resultSub-skill error propagationresearch_topic, book_meeting

Retrieval skills should always return structured metadata alongside content: source URL, timestamp, confidence score. This allows the orchestrator to reason about result quality rather than treating all retrieved content as equally reliable. Action skills should return receipts — unique identifiers or confirmation tokens — so the agent can reference the completed action in subsequent steps without re-executing it.

Compound skills compose simpler primitives into higher-level capabilities. A "research_topic" skill might internally invoke web_search, read_page, and summarize in sequence. Exposing compound skills to the top-level agent reduces the planning horizon required, but introduces a trade-off: the agent loses fine-grained control over intermediate steps and cannot recover gracefully if a sub-skill fails in an unexpected way.

Skill versioning is essential for long-running agent deployments. When a skill interface changes — new required parameters, modified return schema — agents that were trained or prompted to use the old interface will fail silently or produce incorrect results. Semantic versioning for skill APIs, with backward compatibility guarantees within major versions, allows agents to safely call skills without needing to be retrained every time the underlying implementation is updated.

Skill discovery mechanisms allow agents to dynamically learn about available capabilities rather than having a fixed set of tools hardcoded at design time. MCP (Model Context Protocol) and OpenAI function calling both support listing available tools at runtime. Dynamic skill discovery enables agent architectures where new capabilities can be added to the tool registry without modifying the agent itself, and allows the agent to gracefully degrade when certain skills are temporarily unavailable.

Testing agent skills in isolation, before integrating them into a full agent loop, dramatically accelerates development. Unit tests for individual skills verify that correct inputs produce correct outputs and that invalid inputs are rejected gracefully. Integration tests verify that the agent correctly selects and parameterizes skills given realistic natural language inputs, catching schema mismatch errors before they surface in production conversations.