01 — Problem Statement
Why Integration Standards Matter
Every LLM provider used to have a proprietary API: OpenAI's GPT API format, Anthropic's different format, Google's different again. Switching models meant rewriting your entire application. Standards solve this: they define common interfaces for chat completion, function calling, streaming, and tool use, so you can swap providers or run local models without touching business logic.
The Cost of Fragmentation
🔒 Vendor Lock-in
- Proprietary APIs trap you
- Switching costs are high
- No leverage for pricing
🔧 Maintenance Burden
- Multiple code paths per provider
- Hard to A/B test models
- Inconsistent error handling
⏱️ Integration Friction
- New tool connections require code
- Context protocol differences
- Function schemas incompatible
🚀 Innovation Blocked
- Startups can't easily build tools
- Ecosystems don't form
- No common plugin standard
Standards flip this: they enable portability (swap implementations), composability (tools work everywhere), and competition (providers compete on quality, not lock-in).
💡
The OpenAI effect: OpenAI's API became the de-facto standard. Everyone else copied its format. Now it's the industry norm — even open-source models and other vendors implement OpenAI compatibility.
02 — The Standard
OpenAI API Compatibility
The OpenAI chat/completions API is now the lingua franca. Anthropic's API is 95% compatible. vLLM (local inference) implements it fully. Azure Cognitive Services, Mistral, Groq — all expose OpenAI-compatible endpoints. This means your code works with multiple providers with minimal changes.
Standard Chat/Completions Schema
# All these clients use the same schema
from openai import OpenAI # OpenAI
from anthropic import Anthropic # Anthropic
import requests # Any OpenAI-compatible endpoint
# Unified request shape
message = {
"role": "user",
"content": "Summarize quantum computing"
}
system_prompt = {
"role": "system",
"content": "You are a helpful assistant."
}
# Create completion
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[system_prompt, message],
temperature=0.7,
max_tokens=500
) temperature=0.7,
max_tokens=500
)
# Response is always the same shape
print(response.choices[0].message.content)
Key Endpoints (OpenAI Schema)
POST /chat/completions — Chat-based completions (main endpoint)
POST /completions — Legacy text completion (less common)
POST /embeddings — Embed text for vector search
POST /images/generations — Image generation (some providers)
POST /models — List available models
Why Everyone Implements It
OpenAI's API succeeded because it's simple, powerful, and standardized early. Every new provider implements it to ensure instant compatibility with existing tools. vLLM, Ollama, and Mistral all expose /v1/chat/completions endpoints. This means you can deploy a local model using the same client code as GPT-4.
💡
Instant portability: Write once against OpenAI API, run against GPT-4, Claude, local Llama, or Mistral with one environment variable change.
03 — Context Protocol
Model Context Protocol (MCP)
MCP is Anthropic's protocol for connecting LLMs to external tools and data sources. It defines a client-server architecture: the LLM (client) can request tools, resources, and prompts from a server (your app or service). Tools are discovered dynamically, not hard-coded.
MCP Primitives
1
Tools — callable functions
Functions the model can invoke. Defined as JSON schema with name, description, and parameters. Model chooses which tools to call based on the task.
2
Resources — data sources
Read-only data the model can access: files, URLs, APIs. Requested by URI; server streams content back.
3
Prompts — templates
Reusable prompt templates and instructions. Model can request a prompt template with parameters filled in.
MCP Architecture
LLM Client MCP Server
↓ ↓
┌─────────────┐ ┌──────────────────┐
│ Claude │ ←→ │ Your App │
│ (consumer) │ │ (tool provider) │
└─────────────┘ └──────────────────┘
Flows:
- Client: "list_tools"
- Server: "tools: [calc, fetch_url, query_db]"
- Client: "call calc(2+2)"
- Server: "result: 4"
When to Use MCP
- You control both client (Claude) and server (your app)
- Need dynamic tool discovery without redeploying
- Want a standardized protocol for tool communication
- Building agentic systems with multiple tool sources
⚠️
MCP vs function calling: Function calling (below) is simpler for single-provider integrations. MCP is better for multi-tool, multi-provider scenarios. Don't use both at once.
04 — Tool Use
Function Calling & JSON Schema
Function calling lets models invoke tools by returning structured JSON. You define tools as JSON Schema; the model decides when to use them and what parameters to pass. The standard is now OpenAI's format, replicated by Anthropic, Mistral, and others.
Define Tools as JSON Schema
# Define a tool
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["city"]
}
}
}
]
# Model can call this tool
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": "What's the weather in NYC?"}],
tools=tools
)
# Check if model wants to call a tool
if response.choices[0].message.tool_calls:
for call in response.choices[0].message.tool_calls:
print(f"Tool: {call.function.name}")
print(f"Args: {call.function.arguments}")
Implement Tool Handler Loop
# Full agentic loop
messages = [{"role": "user", "content": "What's weather in NYC and LA?"}]
while True:
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=messages,
tools=tools,
tool_choice="auto"
)
if response.choices[0].message.tool_calls:
# Model wants to use a tool
for call in response.choices[0].message.tool_calls:
result = execute_tool(call.function.name, call.function.arguments)
# Add tool result to messages
messages.append({
"role": "user",
"content": f"Tool {call.function.name} returned: {result}"
})
else:
# No more tool calls, model answered
print(response.choices[0].message.content)
break
Strict Mode (Optional)
Use strict=true in JSON schema to enforce strict schema validation. Model must return valid JSON matching the schema exactly.
05 — Real-time
Streaming Protocols (SSE)
Streaming returns tokens as they're generated, enabling real-time UIs and lower time-to-first-token. OpenAI uses Server-Sent Events (SSE). Responses arrive as delta chunks; you reconstruct the full message from chunks.
# Streaming with OpenAI SDK
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": "Explain quantum computing"}],
stream=True # Enable streaming
)
full_text = ""
for chunk in response:
if chunk.choices[0].delta.content:
token = chunk.choices[0].delta.content
full_text += token
print(token, end="", flush=True)
print(f"\nFinish reason: {chunk.choices[0].finish_reason}")
Stream Metadata
delta — Incremental content (tokens or tool calls)
finish_reason — Why stream ended (stop, length, tool_calls)
usage — Token counts (some providers report only at end)
💡
Token counting on stream: Most providers only return token usage after streaming completes. If you need precise counts mid-stream, disable streaming and use non-streaming API.
06 — Deterministic Output
Structured Output & JSON Mode
JSON mode forces models to return valid JSON. Useful for parsing, data extraction, or when you need guaranteed structure. Combine with Pydantic for type-safe outputs.
# JSON mode (returns valid JSON)
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": "Extract person data from text"}],
response_format={"type": "json_object"}
)
import json
result = json.loads(response.choices[0].message.content)
print(result) # Guaranteed to be valid JSON
# With Pydantic (type-safe)
from pydantic import BaseModel
from instructor import patch
client = patch(OpenAI())
class Person(BaseModel):
name: str
age: int
skills: list[str]
person = client.chat.completions.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": "Extract person data"}],
response_model=Person
)
print(person.name) # Autocomplete, type-checked
SDK & Tool Comparison
SDK & Framework Comparison
| SDK/Tool |
API Compat |
Streaming |
Tool Calling |
Retries |
| openai-python |
OpenAI native |
✓ |
✓ |
✓ |
| anthropic-sdk |
95% compatible |
✓ |
✓ (tool_use) |
✓ |
| litellm |
All providers |
✓ |
✓ |
✓ + fallback |
| instructor |
All providers |
✓ |
Pydantic models |
✓ |
| LangChain |
All providers |
✓ |
✓ |
✓ |
Tools & SDKs
Related Tools
07 — Further Reading
References
Documentation
Guides & Blogs