LLM Integration Standards

Contents

Why standards matter
OpenAI compatibility
Model Context Protocol
Function calling
Streaming protocols
Structured outputs
References

01 — Problem Statement

Why Integration Standards Matter

Every LLM provider used to have a proprietary API: OpenAI's GPT API format, Anthropic's different format, Google's different again. Switching models meant rewriting your entire application. Standards solve this: they define common interfaces for chat completion, function calling, streaming, and tool use, so you can swap providers or run local models without touching business logic.

The Cost of Fragmentation

🔒 Vendor Lock-in

Proprietary APIs trap you
Switching costs are high
No leverage for pricing

🔧 Maintenance Burden

Multiple code paths per provider
Hard to A/B test models
Inconsistent error handling

⏱️ Integration Friction

New tool connections require code
Context protocol differences
Function schemas incompatible

🚀 Innovation Blocked

Startups can't easily build tools
Ecosystems don't form
No common plugin standard

Standards flip this: they enable portability (swap implementations), composability (tools work everywhere), and competition (providers compete on quality, not lock-in).

💡 The OpenAI effect: OpenAI's API became the de-facto standard. Everyone else copied its format. Now it's the industry norm — even open-source models and other vendors implement OpenAI compatibility.

02 — The Standard

OpenAI API Compatibility

The OpenAI chat/completions API is now the lingua franca. Anthropic's API is 95% compatible. vLLM (local inference) implements it fully. Azure Cognitive Services, Mistral, Groq — all expose OpenAI-compatible endpoints. This means your code works with multiple providers with minimal changes.

Standard Chat/Completions Schema

# All these clients use the same schema from openai import OpenAI # OpenAI from anthropic import Anthropic # Anthropic import requests # Any OpenAI-compatible endpoint # Unified request shape message = { "role": "user", "content": "Summarize quantum computing" } system_prompt = { "role": "system", "content": "You are a helpful assistant." } # Create completion response = client.chat.completions.create( model="gpt-4-turbo", messages=[system_prompt, message], temperature=0.7, max_tokens=500 ) temperature=0.7, max_tokens=500 ) # Response is always the same shape print(response.choices[0].message.content)

Key Endpoints (OpenAI Schema)

POST /chat/completions — Chat-based completions (main endpoint)
POST /completions — Legacy text completion (less common)
POST /embeddings — Embed text for vector search
POST /images/generations — Image generation (some providers)
POST /models — List available models

Why Everyone Implements It

OpenAI's API succeeded because it's simple, powerful, and standardized early. Every new provider implements it to ensure instant compatibility with existing tools. vLLM, Ollama, and Mistral all expose /v1/chat/completions endpoints. This means you can deploy a local model using the same client code as GPT-4.

💡 Instant portability: Write once against OpenAI API, run against GPT-4, Claude, local Llama, or Mistral with one environment variable change.

03 — Context Protocol

Model Context Protocol (MCP)

MCP is Anthropic's protocol for connecting LLMs to external tools and data sources. It defines a client-server architecture: the LLM (client) can request tools, resources, and prompts from a server (your app or service). Tools are discovered dynamically, not hard-coded.

MCP Primitives

Tools — callable functions

Functions the model can invoke. Defined as JSON schema with name, description, and parameters. Model chooses which tools to call based on the task.

Resources — data sources

Read-only data the model can access: files, URLs, APIs. Requested by URI; server streams content back.

Prompts — templates

Reusable prompt templates and instructions. Model can request a prompt template with parameters filled in.

MCP Architecture

LLM Client MCP Server ↓ ↓ ┌─────────────┐ ┌──────────────────┐ │ Claude │ ←→ │ Your App │ │ (consumer) │ │ (tool provider) │ └─────────────┘ └──────────────────┘ Flows: - Client: "list_tools" - Server: "tools: [calc, fetch_url, query_db]" - Client: "call calc(2+2)" - Server: "result: 4"

When to Use MCP

You control both client (Claude) and server (your app)
Need dynamic tool discovery without redeploying
Want a standardized protocol for tool communication
Building agentic systems with multiple tool sources

⚠️ MCP vs function calling: Function calling (below) is simpler for single-provider integrations. MCP is better for multi-tool, multi-provider scenarios. Don't use both at once.

04 — Tool Use

Function Calling & JSON Schema

Function calling lets models invoke tools by returning structured JSON. You define tools as JSON Schema; the model decides when to use them and what parameters to pass. The standard is now OpenAI's format, replicated by Anthropic, Mistral, and others.

Define Tools as JSON Schema

# Define a tool tools = [ { "type": "function", "function": { "name": "get_weather", "description": "Get weather for a city", "parameters": { "type": "object", "properties": { "city": { "type": "string", "description": "City name" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature unit" } }, "required": ["city"] } } } ] # Model can call this tool response = client.chat.completions.create( model="gpt-4-turbo", messages=[{"role": "user", "content": "What's the weather in NYC?"}], tools=tools ) # Check if model wants to call a tool if response.choices[0].message.tool_calls: for call in response.choices[0].message.tool_calls: print(f"Tool: {call.function.name}") print(f"Args: {call.function.arguments}")

Implement Tool Handler Loop

# Full agentic loop messages = [{"role": "user", "content": "What's weather in NYC and LA?"}] while True: response = client.chat.completions.create( model="gpt-4-turbo", messages=messages, tools=tools, tool_choice="auto" ) if response.choices[0].message.tool_calls: # Model wants to use a tool for call in response.choices[0].message.tool_calls: result = execute_tool(call.function.name, call.function.arguments) # Add tool result to messages messages.append({ "role": "user", "content": f"Tool {call.function.name} returned: {result}" }) else: # No more tool calls, model answered print(response.choices[0].message.content) break

Strict Mode (Optional)

Use strict=true in JSON schema to enforce strict schema validation. Model must return valid JSON matching the schema exactly.

05 — Real-time

Streaming Protocols (SSE)

Streaming returns tokens as they're generated, enabling real-time UIs and lower time-to-first-token. OpenAI uses Server-Sent Events (SSE). Responses arrive as delta chunks; you reconstruct the full message from chunks.

# Streaming with OpenAI SDK response = client.chat.completions.create( model="gpt-4-turbo", messages=[{"role": "user", "content": "Explain quantum computing"}], stream=True # Enable streaming ) full_text = "" for chunk in response: if chunk.choices[0].delta.content: token = chunk.choices[0].delta.content full_text += token print(token, end="", flush=True) print(f"\nFinish reason: {chunk.choices[0].finish_reason}")

Stream Metadata

delta — Incremental content (tokens or tool calls)
finish_reason — Why stream ended (stop, length, tool_calls)
usage — Token counts (some providers report only at end)

💡 Token counting on stream: Most providers only return token usage after streaming completes. If you need precise counts mid-stream, disable streaming and use non-streaming API.

06 — Deterministic Output

Structured Output & JSON Mode

JSON mode forces models to return valid JSON. Useful for parsing, data extraction, or when you need guaranteed structure. Combine with Pydantic for type-safe outputs.

# JSON mode (returns valid JSON) response = client.chat.completions.create( model="gpt-4-turbo", messages=[{"role": "user", "content": "Extract person data from text"}], response_format={"type": "json_object"} ) import json result = json.loads(response.choices[0].message.content) print(result) # Guaranteed to be valid JSON # With Pydantic (type-safe) from pydantic import BaseModel from instructor import patch client = patch(OpenAI()) class Person(BaseModel): name: str age: int skills: list[str] person = client.chat.completions.create( model="gpt-4-turbo", messages=[{"role": "user", "content": "Extract person data"}], response_model=Person ) print(person.name) # Autocomplete, type-checked

SDK & Tool Comparison

SDK & Framework Comparison

SDK/Tool	API Compat	Streaming	Tool Calling	Retries
openai-python	OpenAI native	✓	✓	✓
anthropic-sdk	95% compatible	✓	✓ (tool_use)	✓
litellm	All providers	✓	✓	✓ + fallback
instructor	All providers	✓	Pydantic models	✓
LangChain	All providers	✓	✓	✓

Tools & SDKs

Related Tools

SDK

OpenAI Python SDK

Official OpenAI client with streaming, function calling

SDK

Anthropic SDK

Claude API client (OpenAI-compatible)

Multi-provider

LiteLLM

Unified SDK for all LLM providers

Structured

instructor

Pydantic + LLM for structured outputs

Framework

LangChain

LLM chains and agentic framework

Local

vLLM

OpenAI-compatible local inference server

Local

Ollama

Simple OpenAI-compatible model server

Protocol

Model Context Protocol

Anthropic's client-server tool protocol

07 — Further Reading

References

Documentation

Docs OpenAI API Reference ↗
Docs Model Context Protocol Specification ↗
Docs Anthropic Claude API Docs ↗
Docs LiteLLM Documentation ↗

Guides & Blogs

LLM Integration Standards

Why Integration Standards Matter

The Cost of Fragmentation

🔒 Vendor Lock-in

🔧 Maintenance Burden

⏱️ Integration Friction

🚀 Innovation Blocked

OpenAI API Compatibility

Standard Chat/Completions Schema

Key Endpoints (OpenAI Schema)

Why Everyone Implements It

Model Context Protocol (MCP)

MCP Primitives

Tools — callable functions

Resources — data sources

Prompts — templates

MCP Architecture

When to Use MCP

Function Calling & JSON Schema

Define Tools as JSON Schema

Implement Tool Handler Loop

Strict Mode (Optional)

Streaming Protocols (SSE)

Stream Metadata

Structured Output & JSON Mode

SDK & Framework Comparison

Related Tools

References

Related concepts