Tool Use

Tool use vs prompt engineering
How tool calling works at the API level
Defining tools with Anthropic
Handling tool results
Parallel tool calls
Tool use with OpenAI
Gotchas

SECTION 01

Tool use vs prompt engineering

Before tool use existed, people would put instructions like "respond with JSON in format {action: ..., args: ...}" in the prompt and then parse the output with regex. It worked sometimes and broke spectacularly other times.

Native tool use (also called function calling) is different: you define tools as structured JSON schemas, and the model returns a structured call object — not free text it asks you to parse, but a first-class API type designed for reliable machine parsing. The model knows the exact set of available tools and their input requirements, reducing hallucinated tool calls dramatically.

SECTION 02

How tool calling works at the API level

The flow:

You send a message with a tools array defining available functions.
The model responds with a tool_use content block containing the tool name and input arguments.
You execute the tool in your code and get the result.
You send a follow-up message with the tool result in a tool_result block.
The model continues (possibly calling more tools, or generating a final response).

Your code is the runtime. The model is the planner. The API is the communication channel between them.

SECTION 03

Defining tools with Anthropic

import anthropic, json

client = anthropic.Anthropic()

# Define tools as JSON schema objects
tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a location. Returns temperature in Celsius and conditions.",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City name, e.g. 'London' or 'New York, NY'"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit. Defaults to celsius."
                }
            },
            "required": ["location"]
        }
    },
    {
        "name": "get_stock_price",
        "description": "Get the current stock price for a ticker symbol.",
        "input_schema": {
            "type": "object",
            "properties": {
                "ticker": {"type": "string", "description": "Stock ticker, e.g. AAPL, MSFT"}
            },
            "required": ["ticker"]
        }
    }
]

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's the weather like in Paris and London?"}]
)

print(response.stop_reason)   # "tool_use"
for block in response.content:
    if block.type == "tool_use":
        print(f"Tool: {block.name}, Input: {block.input}")
# Tool: get_weather, Input: {'location': 'Paris', 'unit': 'celsius'}
# Tool: get_weather, Input: {'location': 'London', 'unit': 'celsius'}

SECTION 04

Handling tool results

import anthropic

client = anthropic.Anthropic()

def get_weather(location: str, unit: str = "celsius") -> dict:
    '''Mock weather API — replace with real implementation.'''
    data = {"Paris": {"temp": 18, "conditions": "Partly cloudy"},
            "London": {"temp": 14, "conditions": "Overcast"}}
    d = data.get(location, {"temp": 20, "conditions": "Unknown"})
    return {"location": location, "temperature": d["temp"], "unit": unit, "conditions": d["conditions"]}

def run_with_tools(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=1024,
            tools=tools,    # defined in previous section
            messages=messages
        )

        if response.stop_reason == "end_turn":
            # No more tool calls — extract text response
            return next(b.text for b in response.content if hasattr(b, "text"))

        if response.stop_reason == "tool_use":
            # Add assistant's response (including tool_use blocks) to history
            messages.append({"role": "assistant", "content": response.content})

            # Execute all tool calls and collect results
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    if block.name == "get_weather":
                        result = get_weather(**block.input)
                    elif block.name == "get_stock_price":
                        result = {"ticker": block.input["ticker"], "price": 189.42}
                    else:
                        result = {"error": f"Unknown tool: {block.name}"}

                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": str(result)
                    })

            # Return tool results to the model
            messages.append({"role": "user", "content": tool_results})

result = run_with_tools("Compare the weather in Paris and London.")
print(result)

SECTION 05

Parallel tool calls

Claude can request multiple tools in a single turn. This is efficient — all tool calls in one response can be executed concurrently:

import asyncio, anthropic

client = anthropic.Anthropic()

async def get_weather_async(location: str) -> dict:
    await asyncio.sleep(0.1)   # simulate API latency
    return {"location": location, "temp": 18}

async def execute_tool_calls_parallel(tool_use_blocks):
    '''Execute all tool calls concurrently.'''
    tasks = []
    for block in tool_use_blocks:
        if block.name == "get_weather":
            tasks.append((block.id, get_weather_async(block.input["location"])))

    results = await asyncio.gather(*[t[1] for t in tasks])
    return [
        {"type": "tool_result", "tool_use_id": tasks[i][0], "content": str(r)}
        for i, r in enumerate(results)
    ]

# If Claude calls get_weather for Paris AND London in one response,
# we execute both API calls simultaneously — 2× faster than sequential

Always execute parallel tool calls concurrently. Sequential execution unnecessarily doubles your latency — Claude grouped them in one response precisely because they're independent.

SECTION 06

Tool use with OpenAI

from openai import OpenAI
import json

client = OpenAI()

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"},
                    "unit":     {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["location"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    tools=tools,
    messages=[{"role": "user", "content": "Weather in Tokyo?"}]
)

# Parse tool call
tool_call = response.choices[0].message.tool_calls[0]
args = json.loads(tool_call.function.arguments)

# Execute and return result
weather_result = get_weather(**args)

messages = [
    {"role": "user", "content": "Weather in Tokyo?"},
    response.choices[0].message,   # include assistant's tool_call message
    {"role": "tool", "tool_call_id": tool_call.id, "content": str(weather_result)}
]
final = client.chat.completions.create(model="gpt-4o", messages=messages)
print(final.choices[0].message.content)

SECTION 07

Gotchas

Always handle stop_reason == "tool_use" explicitly. If you check for "end_turn" only and return early, you silently discard tool calls and the user gets a truncated response with no explanation.

Tool results must reference the correct tool_use_id. If you have 3 parallel tool calls and only return 2 results (mismatching IDs), the API returns a 400 error. Always return one result per tool call, in the correct ID order.

Keep descriptions accurate and specific. The model decides which tool to call based on descriptions alone. Vague descriptions ("process input") lead to wrong selections. Specific descriptions ("Search the web for current news and factual information; use for questions about recent events") lead to correct ones.

Tool schemas are part of your prompt budget. Complex schemas with many fields and long descriptions consume significant tokens on every call. Audit your schema for verbosity — a 2,000-token tool definition repeated across 1,000 requests is 2M tokens of overhead per day.

Pattern	Description	When to Use	Risk
Single tool call	Model calls one tool, gets result, responds	Simple lookups, calculations	Low — easy to audit
Sequential chaining	Output of tool A feeds into tool B	Multi-step workflows	Medium — error propagation
Parallel calls	Multiple independent tools called simultaneously	Fetching from multiple sources	Medium — harder to debug
Agentic loops	Model iterates tool calls until task complete	Complex open-ended tasks	High — needs loop limit, human gate
Human-in-loop	Pause for human approval before high-stakes tools	Write/delete/send operations	Low — safest for irreversible actions

Tool Use

Table of Contents

Tool use vs prompt engineering

How tool calling works at the API level

Defining tools with Anthropic

Handling tool results

Parallel tool calls

Tool use with OpenAI

Gotchas

Tool Use Patterns & Best Practices

Tool Use

Table of Contents

Tool use vs prompt engineering

How tool calling works at the API level

Defining tools with Anthropic

Handling tool results

Parallel tool calls

Tool use with OpenAI

Gotchas

Related concepts

Tool Use Patterns & Best Practices