Architecture · Standards

LLM Integration Standards

OpenAI-compatible APIs, MCP, function calling schemas, and streaming protocols — the contracts that let models and tools work together

4 standards
7 sections
Python-first SDKs
Contents
  1. Why standards matter
  2. OpenAI compatibility
  3. Model Context Protocol
  4. Function calling
  5. Streaming protocols
  6. Structured outputs
  7. References
01 — Problem Statement

Why Integration Standards Matter

Every LLM provider used to have a proprietary API: OpenAI's GPT API format, Anthropic's different format, Google's different again. Switching models meant rewriting your entire application. Standards solve this: they define common interfaces for chat completion, function calling, streaming, and tool use, so you can swap providers or run local models without touching business logic.

The Cost of Fragmentation

🔒 Vendor Lock-in

  • Proprietary APIs trap you
  • Switching costs are high
  • No leverage for pricing

🔧 Maintenance Burden

  • Multiple code paths per provider
  • Hard to A/B test models
  • Inconsistent error handling

⏱️ Integration Friction

  • New tool connections require code
  • Context protocol differences
  • Function schemas incompatible

🚀 Innovation Blocked

  • Startups can't easily build tools
  • Ecosystems don't form
  • No common plugin standard

Standards flip this: they enable portability (swap implementations), composability (tools work everywhere), and competition (providers compete on quality, not lock-in).

💡 The OpenAI effect: OpenAI's API became the de-facto standard. Everyone else copied its format. Now it's the industry norm — even open-source models and other vendors implement OpenAI compatibility.
02 — The Standard

OpenAI API Compatibility

The OpenAI chat/completions API is now the lingua franca. Anthropic's API is 95% compatible. vLLM (local inference) implements it fully. Azure Cognitive Services, Mistral, Groq — all expose OpenAI-compatible endpoints. This means your code works with multiple providers with minimal changes.

Standard Chat/Completions Schema

# All these clients use the same schema from openai import OpenAI # OpenAI from anthropic import Anthropic # Anthropic import requests # Any OpenAI-compatible endpoint # Unified request shape message = { "role": "user", "content": "Summarize quantum computing" } system_prompt = { "role": "system", "content": "You are a helpful assistant." } # Create completion response = client.chat.completions.create( model="gpt-4-turbo", messages=[system_prompt, message], temperature=0.7, max_tokens=500 ) temperature=0.7, max_tokens=500 ) # Response is always the same shape print(response.choices[0].message.content)

Key Endpoints (OpenAI Schema)

Why Everyone Implements It

OpenAI's API succeeded because it's simple, powerful, and standardized early. Every new provider implements it to ensure instant compatibility with existing tools. vLLM, Ollama, and Mistral all expose /v1/chat/completions endpoints. This means you can deploy a local model using the same client code as GPT-4.

💡 Instant portability: Write once against OpenAI API, run against GPT-4, Claude, local Llama, or Mistral with one environment variable change.
03 — Context Protocol

Model Context Protocol (MCP)

MCP is Anthropic's protocol for connecting LLMs to external tools and data sources. It defines a client-server architecture: the LLM (client) can request tools, resources, and prompts from a server (your app or service). Tools are discovered dynamically, not hard-coded.

MCP Primitives

1

Tools — callable functions

Functions the model can invoke. Defined as JSON schema with name, description, and parameters. Model chooses which tools to call based on the task.

2

Resources — data sources

Read-only data the model can access: files, URLs, APIs. Requested by URI; server streams content back.

3

Prompts — templates

Reusable prompt templates and instructions. Model can request a prompt template with parameters filled in.

MCP Architecture

LLM Client MCP Server ↓ ↓ ┌─────────────┐ ┌──────────────────┐ │ Claude │ ←→ │ Your App │ │ (consumer) │ │ (tool provider) │ └─────────────┘ └──────────────────┘ Flows: - Client: "list_tools" - Server: "tools: [calc, fetch_url, query_db]" - Client: "call calc(2+2)" - Server: "result: 4"

When to Use MCP

⚠️ MCP vs function calling: Function calling (below) is simpler for single-provider integrations. MCP is better for multi-tool, multi-provider scenarios. Don't use both at once.
04 — Tool Use

Function Calling & JSON Schema

Function calling lets models invoke tools by returning structured JSON. You define tools as JSON Schema; the model decides when to use them and what parameters to pass. The standard is now OpenAI's format, replicated by Anthropic, Mistral, and others.

Define Tools as JSON Schema

# Define a tool tools = [ { "type": "function", "function": { "name": "get_weather", "description": "Get weather for a city", "parameters": { "type": "object", "properties": { "city": { "type": "string", "description": "City name" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature unit" } }, "required": ["city"] } } } ] # Model can call this tool response = client.chat.completions.create( model="gpt-4-turbo", messages=[{"role": "user", "content": "What's the weather in NYC?"}], tools=tools ) # Check if model wants to call a tool if response.choices[0].message.tool_calls: for call in response.choices[0].message.tool_calls: print(f"Tool: {call.function.name}") print(f"Args: {call.function.arguments}")

Implement Tool Handler Loop

# Full agentic loop messages = [{"role": "user", "content": "What's weather in NYC and LA?"}] while True: response = client.chat.completions.create( model="gpt-4-turbo", messages=messages, tools=tools, tool_choice="auto" ) if response.choices[0].message.tool_calls: # Model wants to use a tool for call in response.choices[0].message.tool_calls: result = execute_tool(call.function.name, call.function.arguments) # Add tool result to messages messages.append({ "role": "user", "content": f"Tool {call.function.name} returned: {result}" }) else: # No more tool calls, model answered print(response.choices[0].message.content) break

Strict Mode (Optional)

Use strict=true in JSON schema to enforce strict schema validation. Model must return valid JSON matching the schema exactly.

05 — Real-time

Streaming Protocols (SSE)

Streaming returns tokens as they're generated, enabling real-time UIs and lower time-to-first-token. OpenAI uses Server-Sent Events (SSE). Responses arrive as delta chunks; you reconstruct the full message from chunks.

# Streaming with OpenAI SDK response = client.chat.completions.create( model="gpt-4-turbo", messages=[{"role": "user", "content": "Explain quantum computing"}], stream=True # Enable streaming ) full_text = "" for chunk in response: if chunk.choices[0].delta.content: token = chunk.choices[0].delta.content full_text += token print(token, end="", flush=True) print(f"\nFinish reason: {chunk.choices[0].finish_reason}")

Stream Metadata

💡 Token counting on stream: Most providers only return token usage after streaming completes. If you need precise counts mid-stream, disable streaming and use non-streaming API.
06 — Deterministic Output

Structured Output & JSON Mode

JSON mode forces models to return valid JSON. Useful for parsing, data extraction, or when you need guaranteed structure. Combine with Pydantic for type-safe outputs.

# JSON mode (returns valid JSON) response = client.chat.completions.create( model="gpt-4-turbo", messages=[{"role": "user", "content": "Extract person data from text"}], response_format={"type": "json_object"} ) import json result = json.loads(response.choices[0].message.content) print(result) # Guaranteed to be valid JSON # With Pydantic (type-safe) from pydantic import BaseModel from instructor import patch client = patch(OpenAI()) class Person(BaseModel): name: str age: int skills: list[str] person = client.chat.completions.create( model="gpt-4-turbo", messages=[{"role": "user", "content": "Extract person data"}], response_model=Person ) print(person.name) # Autocomplete, type-checked
SDK & Tool Comparison

SDK & Framework Comparison

SDK/Tool API Compat Streaming Tool Calling Retries
openai-python OpenAI native
anthropic-sdk 95% compatible ✓ (tool_use)
litellm All providers ✓ + fallback
instructor All providers Pydantic models
LangChain All providers
Tools & SDKs

Related Tools

SDK
OpenAI Python SDK
Official OpenAI client with streaming, function calling
SDK
Anthropic SDK
Claude API client (OpenAI-compatible)
Multi-provider
LiteLLM
Unified SDK for all LLM providers
Structured
instructor
Pydantic + LLM for structured outputs
Framework
LangChain
LLM chains and agentic framework
Local
vLLM
OpenAI-compatible local inference server
Local
Ollama
Simple OpenAI-compatible model server
Protocol
Model Context Protocol
Anthropic's client-server tool protocol
07 — Further Reading

References

Documentation
Guides & Blogs