Function Calling

Function calling vs prompt-based tool use
The JSON schema contract
Anthropic function calling
OpenAI function calling
Parallel and forced function calls
Building a complete tool loop
Gotchas

SECTION 01

Function calling vs prompt-based tool use

Before native function calling existed, developers used prompts like "You have access to these tools: [list]. When you want to use a tool, output exactly: TOOL: tool_name(args)". This worked, but required fragile string parsing, offered no type safety, and the model could generate any text format it liked.

Native function calling is fundamentally different: you define tools as first-class JSON schemas, the model is trained to output structured tool call objects (not free text), and the API validates the output. The result is reliable, machine-parseable tool dispatch that works consistently even in complex multi-tool scenarios.

Modern LLMs (Claude, GPT-4, Gemini) are specifically fine-tuned on function calling — they understand JSON schemas natively and know how to select the right tool and fill in its arguments precisely.

SECTION 02

The JSON schema contract

A function/tool definition has three parts: name, description (natural language — the LLM uses this to decide when to call the tool), and input_schema (JSON Schema defining the parameters):

weather_tool = {
    "name": "get_weather",
    "description": "Get current weather for a city. Use when the user asks about weather conditions.",
    "input_schema": {
        "type": "object",
        "properties": {
            "city": {
                "type": "string",
                "description": "City name, e.g. 'London' or 'Tokyo'"
            },
            "units": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"],
                "description": "Temperature units. Default: celsius"
            }
        },
        "required": ["city"]
    }
}

The description is not documentation — it's a prompt. Write it like you're instructing the model: "Use this tool when...", "Returns X given Y", "Do not use for Z".

SECTION 03

Anthropic function calling

import anthropic, json

client = anthropic.Anthropic()

tools = [{
    "name": "get_weather",
    "description": "Get current weather for a city.",
    "input_schema": {
        "type": "object",
        "properties": {
            "city": {"type": "string", "description": "City name"}
        },
        "required": ["city"]
    }
}]

# Step 1: Model decides to call the tool
response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
)

print(response.stop_reason)  # "tool_use"

# Extract tool call
tool_use = next(b for b in response.content if b.type == "tool_use")
print(tool_use.name)    # "get_weather"
print(tool_use.input)   # {"city": "Tokyo"}

# Step 2: Execute the tool
def get_weather(city: str) -> dict:
    return {"city": city, "temp": 18, "condition": "Partly cloudy"}

tool_result = get_weather(**tool_use.input)

# Step 3: Return result to model
final = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    tools=tools,
    messages=[
        {"role": "user", "content": "What's the weather in Tokyo?"},
        {"role": "assistant", "content": response.content},
        {"role": "user", "content": [{
            "type": "tool_result",
            "tool_use_id": tool_use.id,
            "content": json.dumps(tool_result)
        }]}
    ]
)
print(final.content[0].text)  # "The weather in Tokyo is 18°C and partly cloudy."

SECTION 04

OpenAI function calling

from openai import OpenAI
import json

client = OpenAI()

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a city.",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string"}
            },
            "required": ["city"]
        }
    }
}]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Weather in London?"}],
    tools=tools,
    tool_choice="auto",  # or "required" or {"type": "function", "function": {"name": "get_weather"}}
)

# Check if model called a tool
msg = response.choices[0].message
if msg.tool_calls:
    call = msg.tool_calls[0]
    args = json.loads(call.function.arguments)
    result = get_weather(**args)

    # Continue conversation with tool result
    final = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": "Weather in London?"},
            msg,  # assistant message with tool_calls
            {"role": "tool", "tool_call_id": call.id, "content": json.dumps(result)}
        ],
        tools=tools,
    )
    print(final.choices[0].message.content)

SECTION 05

Parallel and forced function calls

Parallel calls: the model can request multiple tool calls in a single response. Claude groups independent tool calls and returns them together, signalling they can run concurrently:

import asyncio

# Model returns multiple tool_use blocks
tool_calls = [b for b in response.content if b.type == "tool_use"]

# Execute in parallel
async def run_all(calls):
    tasks = [asyncio.to_thread(dispatch_tool, c.name, c.input) for c in calls]
    return await asyncio.gather(*tasks)

results = asyncio.run(run_all(tool_calls))

# Return all results at once
tool_results = [
    {"type": "tool_result", "tool_use_id": c.id, "content": json.dumps(r)}
    for c, r in zip(tool_calls, results)
]

Forced function calls: use tool_choice={"type": "tool", "name": "get_weather"} (Anthropic) or tool_choice={"type": "function", "function": {"name": "get_weather"}} (OpenAI) to force a specific tool call. Useful for structured data extraction where you always want output in a specific format.

SECTION 06

Building a complete tool loop

import anthropic, json

client = anthropic.Anthropic()

TOOLS = [
    {"name": "search", "description": "Search the web.", "input_schema": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]}},
    {"name": "calculator", "description": "Evaluate math expressions.", "input_schema": {"type": "object", "properties": {"expression": {"type": "string"}}, "required": ["expression"]}},
]

def dispatch(name: str, args: dict) -> str:
    if name == "search":
        return f"Search results for '{args['query']}': [mock result]"
    if name == "calculator":
        try: return str(eval(args["expression"]))
        except Exception as e: return f"Error: {e}"
    return "Unknown tool"

def agent_loop(user_query: str, max_steps: int = 5) -> str:
    messages = [{"role": "user", "content": user_query}]
    for _ in range(max_steps):
        resp = client.messages.create(
            model="claude-sonnet-4-5", max_tokens=1024,
            tools=TOOLS, messages=messages
        )
        if resp.stop_reason == "end_turn":
            return next(b.text for b in resp.content if hasattr(b, "text"))
        if resp.stop_reason == "tool_use":
            messages.append({"role": "assistant", "content": resp.content})
            tool_results = []
            for block in resp.content:
                if block.type == "tool_use":
                    result = dispatch(block.name, block.input)
                    tool_results.append({"type": "tool_result", "tool_use_id": block.id, "content": result})
            messages.append({"role": "user", "content": tool_results})
    return "Max steps reached"

print(agent_loop("What is the square root of 1764?"))

SECTION 07

Gotchas

Always check stop_reason before parsing. If you assume the model always returns a tool call and it doesn't (stop_reason="end_turn"), your code will crash trying to find a tool_use block that isn't there. Always branch on stop_reason first.

Tool results must be returned in the same turn. Anthropic and OpenAI both expect all tool results from a multi-tool response to be returned in a single user message. Don't send tool results one-by-one in separate API calls.

The description is load-bearing. The model decides which tool to call purely from the description. If two tools have similar descriptions, the model will pick the wrong one. Make descriptions mutually exclusive and specific: "Use search for current information. Use calculator for math. Do not use search for math questions."

Tool errors should return strings, not raise exceptions. When a tool fails, return an error message string as the tool result. The model can then recover and try a different approach. Propagating exceptions kills the agent loop entirely.

SECTION 08

Function Calling Across Providers

Feature	Anthropic (tools)	OpenAI (functions)	Google (function declarations)
Parallel calls	Yes (tool_use blocks)	Yes (parallel_tool_calls)	Yes
Forced call	tool_choice: {type: "tool", name: "X"}	tool_choice: {"type":"function","function":{"name":"X"}}	mode: "ANY"
Result injection	tool_result content block	role: "tool" message	functionResponse part
Schema format	JSON Schema subset	JSON Schema subset	OpenAPI subset
Streaming support	Yes (input_json_delta)	Yes (tool_calls delta)	Yes

When building provider-agnostic tool loops, LiteLLM normalises the tool call format across providers so you can write your tool loop once and swap providers via a model string. The main divergence to handle is error injection: Anthropic requires tool results even on errors (return an error message in the tool_result content), while some providers allow omitting failed tool results. Always return a result for every tool call to avoid malformed conversation histories that confuse subsequent turns.

Secure tool loop implementations should validate tool arguments before execution, not just trust the LLM-generated JSON. Even with a well-designed schema, models occasionally hallucinate parameter values outside expected ranges or inject unexpected string content into path parameters. Add a validation layer between tool call parsing and execution: check that numeric parameters are within bounds, string parameters pass regex validation, and file paths are within permitted directories. Treat LLM-generated tool arguments as untrusted user input, not internal trusted data.

For multi-turn conversations with tool use, maintain a clean call history by including all tool_use and tool_result pairs in the message history. Many LLM providers require a complete alternating sequence of assistant tool calls followed by user tool results. Always append to an existing message list rather than reconstructing it to avoid missing intermediate turns that cause API errors or degraded behaviour.

Test your tool loop with adversarial inputs before deploying to production. Ask the model to call tools with edge-case arguments: empty strings, very large numbers, SQL injection patterns, and path traversal strings. A well-designed tool should validate and reject these inputs gracefully. Log any case where validation fails -- these represent potential security issues that need fixing before the agent handles untrusted user input at scale.