The mechanism by which LLMs call external functions — defining tools as JSON schemas, receiving structured call requests from the model, executing them, and returning results.
Before tool use existed, people would put instructions like "respond with JSON in format {action: ..., args: ...}" in the prompt and then parse the output with regex. It worked sometimes and broke spectacularly other times.
Native tool use (also called function calling) is different: you define tools as structured JSON schemas, and the model returns a structured call object — not free text it asks you to parse, but a first-class API type designed for reliable machine parsing. The model knows the exact set of available tools and their input requirements, reducing hallucinated tool calls dramatically.
The flow:
tools array defining available functions.tool_use content block containing the tool name and input arguments.tool_result block.Your code is the runtime. The model is the planner. The API is the communication channel between them.
import anthropic, json
client = anthropic.Anthropic()
# Define tools as JSON schema objects
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a location. Returns temperature in Celsius and conditions.",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g. 'London' or 'New York, NY'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit. Defaults to celsius."
}
},
"required": ["location"]
}
},
{
"name": "get_stock_price",
"description": "Get the current stock price for a ticker symbol.",
"input_schema": {
"type": "object",
"properties": {
"ticker": {"type": "string", "description": "Stock ticker, e.g. AAPL, MSFT"}
},
"required": ["ticker"]
}
}
]
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
tools=tools,
messages=[{"role": "user", "content": "What's the weather like in Paris and London?"}]
)
print(response.stop_reason) # "tool_use"
for block in response.content:
if block.type == "tool_use":
print(f"Tool: {block.name}, Input: {block.input}")
# Tool: get_weather, Input: {'location': 'Paris', 'unit': 'celsius'}
# Tool: get_weather, Input: {'location': 'London', 'unit': 'celsius'}
import anthropic
client = anthropic.Anthropic()
def get_weather(location: str, unit: str = "celsius") -> dict:
'''Mock weather API — replace with real implementation.'''
data = {"Paris": {"temp": 18, "conditions": "Partly cloudy"},
"London": {"temp": 14, "conditions": "Overcast"}}
d = data.get(location, {"temp": 20, "conditions": "Unknown"})
return {"location": location, "temperature": d["temp"], "unit": unit, "conditions": d["conditions"]}
def run_with_tools(user_message: str) -> str:
messages = [{"role": "user", "content": user_message}]
while True:
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
tools=tools, # defined in previous section
messages=messages
)
if response.stop_reason == "end_turn":
# No more tool calls — extract text response
return next(b.text for b in response.content if hasattr(b, "text"))
if response.stop_reason == "tool_use":
# Add assistant's response (including tool_use blocks) to history
messages.append({"role": "assistant", "content": response.content})
# Execute all tool calls and collect results
tool_results = []
for block in response.content:
if block.type == "tool_use":
if block.name == "get_weather":
result = get_weather(**block.input)
elif block.name == "get_stock_price":
result = {"ticker": block.input["ticker"], "price": 189.42}
else:
result = {"error": f"Unknown tool: {block.name}"}
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": str(result)
})
# Return tool results to the model
messages.append({"role": "user", "content": tool_results})
result = run_with_tools("Compare the weather in Paris and London.")
print(result)
Claude can request multiple tools in a single turn. This is efficient — all tool calls in one response can be executed concurrently:
import asyncio, anthropic
client = anthropic.Anthropic()
async def get_weather_async(location: str) -> dict:
await asyncio.sleep(0.1) # simulate API latency
return {"location": location, "temp": 18}
async def execute_tool_calls_parallel(tool_use_blocks):
'''Execute all tool calls concurrently.'''
tasks = []
for block in tool_use_blocks:
if block.name == "get_weather":
tasks.append((block.id, get_weather_async(block.input["location"])))
results = await asyncio.gather(*[t[1] for t in tasks])
return [
{"type": "tool_result", "tool_use_id": tasks[i][0], "content": str(r)}
for i, r in enumerate(results)
]
# If Claude calls get_weather for Paris AND London in one response,
# we execute both API calls simultaneously — 2× faster than sequential
Always execute parallel tool calls concurrently. Sequential execution unnecessarily doubles your latency — Claude grouped them in one response precisely because they're independent.
from openai import OpenAI
import json
client = OpenAI()
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city.",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
}
]
response = client.chat.completions.create(
model="gpt-4o",
tools=tools,
messages=[{"role": "user", "content": "Weather in Tokyo?"}]
)
# Parse tool call
tool_call = response.choices[0].message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
# Execute and return result
weather_result = get_weather(**args)
messages = [
{"role": "user", "content": "Weather in Tokyo?"},
response.choices[0].message, # include assistant's tool_call message
{"role": "tool", "tool_call_id": tool_call.id, "content": str(weather_result)}
]
final = client.chat.completions.create(model="gpt-4o", messages=messages)
print(final.choices[0].message.content)
Always handle stop_reason == "tool_use" explicitly. If you check for "end_turn" only and return early, you silently discard tool calls and the user gets a truncated response with no explanation.
Tool results must reference the correct tool_use_id. If you have 3 parallel tool calls and only return 2 results (mismatching IDs), the API returns a 400 error. Always return one result per tool call, in the correct ID order.
Keep descriptions accurate and specific. The model decides which tool to call based on descriptions alone. Vague descriptions ("process input") lead to wrong selections. Specific descriptions ("Search the web for current news and factual information; use for questions about recent events") lead to correct ones.
Tool schemas are part of your prompt budget. Complex schemas with many fields and long descriptions consume significant tokens on every call. Audit your schema for verbosity — a 2,000-token tool definition repeated across 1,000 requests is 2M tokens of overhead per day.
Tool use unlocks a qualitative shift in what LLMs can do — from pattern matching on training data to actively querying live systems, running code, and manipulating state. The key design decisions are: how many tools to expose (fewer is better — too many choices degrades tool selection), how to describe them (natural language descriptions matter more than parameter names), and whether to allow parallel tool calls (yes, for independent operations).
Common failure modes: the model calls the wrong tool due to ambiguous descriptions; the model hallucinates tool arguments for tools it doesn't fully understand; the model gets stuck in tool-call loops when a tool returns an error. Mitigations: use distinct tool names, include examples in descriptions, set a max-turns limit, and handle tool errors gracefully by returning structured error messages the model can reason about.
| Pattern | Description | When to Use | Risk |
|---|---|---|---|
| Single tool call | Model calls one tool, gets result, responds | Simple lookups, calculations | Low — easy to audit |
| Sequential chaining | Output of tool A feeds into tool B | Multi-step workflows | Medium — error propagation |
| Parallel calls | Multiple independent tools called simultaneously | Fetching from multiple sources | Medium — harder to debug |
| Agentic loops | Model iterates tool calls until task complete | Complex open-ended tasks | High — needs loop limit, human gate |
| Human-in-loop | Pause for human approval before high-stakes tools | Write/delete/send operations | Low — safest for irreversible actions |