LLM Output Control

Contents

The problem with free-text output
Native structured output modes
Instructor library
Constrained decoding with Outlines
Grammar-guided generation and LMQL
Classification and choice constraints
Production patterns

01 — Core Challenge

The Problem with Free-Text Output

LLMs produce free text by default. Production systems need structured data: JSON objects, typed fields, validated schemas.

Three approaches in order of reliability: (1) prompt engineering ("respond only in JSON"), (2) native structured output modes (OpenAI/Anthropic), (3) constrained decoding (grammar-guided token sampling).

Why prompting alone fails: model can still produce preamble ("Sure! Here's the JSON: ..."), truncate mid-object, or add trailing text. Brittle at scale.

⚠️ Never use json.loads() on raw LLM output in production without a try/except and retry loop. Even "JSON mode" can produce subtly malformed output on edge cases.

02 — Provider-Native

Native Structured Output Modes

OpenAI response_format: set to {"type": "json_object"} for best-effort JSON, or use response_format=MyPydanticModel with beta.chat.completions.parse for strict schema enforcement

Anthropic: pass tool definitions — model always returns valid tool_use blocks matching the schema

Strict mode (OpenAI): guarantees output exactly matches your JSON Schema. No extra fields, no missing required fields. Uses constrained decoding under the hood.

OpenAI Strict Structured Output

from openai import OpenAI from pydantic import BaseModel from typing import Literal client = OpenAI() class DocumentAnalysis(BaseModel): title: str sentiment: Literal["positive", "negative", "neutral"] key_points: list[str] confidence: float requires_human_review: bool response = client.beta.chat.completions.parse( model="gpt-4o-2024-08-06", messages=[ {"role": "system", "content": "Analyze the document and extract structured information."}, {"role": "user", "content": document_text} ], response_format=DocumentAnalysis ) result: DocumentAnalysis = response.choices[0].message.parsed # result is a typed Pydantic object — no parsing needed

Structured Output Methods Comparison

Method	Reliability	Schema support	Retry needed	Library
Prompt ("return JSON")	Low	Informal	Often	None
json_object mode	Medium	None (any JSON)	Sometimes	OpenAI
Strict parse (Pydantic)	High	Full Pydantic	Rarely	OpenAI beta
Tool/function calling	High	JSON Schema	Rarely	OpenAI/Anthropic
Outlines (constrained)	Very high	Regex/EBNF/JSON	Almost never	Outlines

03 — Cross-Provider

Instructor Library

instructor wraps any LLM (OpenAI, Anthropic, Gemini, local) and adds Pydantic validation + automatic retries. On validation failure: feeds the error back to the model with the original request — model self-corrects.

Works across providers: single API, swap model without changing structured output code.

Instructor Example

import instructor from anthropic import Anthropic from pydantic import BaseModel, validator client = instructor.from_anthropic(Anthropic()) class ExtractedEntity(BaseModel): name: str entity_type: str confidence: float @validator("confidence") def confidence_range(cls, v): assert 0 <= v <= 1, "confidence must be 0-1" return v class ExtractionResult(BaseModel): entities: list[ExtractedEntity] summary: str # instructor handles validation + retries automatically result = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=1024, response_model=ExtractionResult, messages=[{"role": "user", "content": f"Extract entities from: {text}"}] ) # result.entities is a validated list[ExtractedEntity]

✓ instructor's max_retries=3 parameter means the model gets up to 3 attempts to fix validation errors. For complex nested schemas, this dramatically reduces production failures.

04 — Token-Level

Constrained Decoding with Outlines

Constrained decoding: instead of sampling freely from the full vocabulary, mask out tokens that would violate the schema at each generation step

Outlines: open-source library that intercepts token logits and zeroes out invalid tokens before sampling. Works with any local model (Transformers, vLLM).

Supported constraints: JSON Schema, Pydantic models, regex patterns, EBNF grammars, choice from a list

Outlines with Local Model

import outlines from pydantic import BaseModel model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2") class Movie(BaseModel): title: str director: str year: int genre: list[str] # Constrained generation — guaranteed valid JSON matching Movie schema generator = outlines.generate.json(model, Movie) movie = generator("Extract movie details from: The Godfather (1972) directed by Coppola") # movie is a Movie object — no parsing, no retries needed # Regex constraint phone_gen = outlines.generate.regex(model, r"\d{3}-\d{3}-\d{4}") phone = phone_gen("What's the phone number? Area code 555, number 867-5309") # guaranteed to produce NNN-NNN-NNNN format

Outlines vs Native Structured Output

Aspect	OpenAI strict mode	Outlines (local)
Guarantee	Schema-level	Token-level (absolute)
Latency	API round-trip	Local GPU
Model	GPT-4o only	Any GGUF/HF model
Grammar support	JSON Schema only	JSON, regex, EBNF
Cost	Per token	Infrastructure only

05 — Grammar Approach

Grammar-Guided Generation and LMQL

EBNF grammars: define the exact syntax of valid outputs. Any context-free grammar can be enforced.

llama.cpp grammar: --grammar or --grammar-file flag to enforce output format at inference time

LMQL (Language Model Query Language): SQL-like language for constrained generation with variables, conditionals, and loops

Guidance: Microsoft library for interleaving Python control flow with LLM generation. Template with {{gen ...}} blocks.

llama.cpp Grammar for Structured Address

# address.gbnf root ::= city "," ws state ws zip city ::= [a-zA-Z ]+ state ::= [A-Z][A-Z] zip ::= [0-9][0-9][0-9][0-9][0-9] ws ::= " "? # Usage: # ./llama-cli -m model.gguf --grammar-file address.gbnf \ # -p "What city is the Eiffel Tower in? Answer:"

06 — Specialized

Classification and Choice Constraints

Multiple-choice: constrain output to exactly one of N options — eliminates ambiguous paraphrasing

Outlines generate.choice: guarantees output is one of the provided strings, no variations

Choice Constraint for Classification

import outlines model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2") # Guarantee exactly one of these labels — no "Positive.", "POSITIVE", etc. sentiment = outlines.generate.choice( model, ["positive", "negative", "neutral", "mixed"] ) label = sentiment("Classify: 'The product works but shipping was slow'") # label is guaranteed to be one of the four strings # Token efficiency: first-token decoding for binary choices # Only one forward pass needed when choices diverge at token 1

⚠️ For classification with >2 classes, constrained decoding is almost always more reliable than parsing free text. The latency cost is negligible compared to the reliability gain.

07 — Deployment

Production Patterns

✓ Validate at the boundary

Always validate LLM output at the application layer with Pydantic, even when using strict mode
Defense in depth

⟲ Retry with error context

On validation failure, send the error message back to the model: "Your previous output failed validation: {error}. Please fix it."
Up to 3 retries

📋 Schema versioning

Treat your output schema as part of your API contract
Schema changes should be versioned
Test new schemas against your golden eval set before deploying

⚡ Graceful degradation

When structured output fails after retries, fall back to a simpler schema or return a "parsing failed" sentinel
Log failures for schema analysis

Tools & Libraries

Validation

instructor

Cross-provider structured outputs with auto-retry

Local

Outlines

Constrained decoding for local models

Generation

Guidance

Interleave Python control flow with generation

Query

LMQL

SQL-like constrained generation language

Schema

Pydantic

Data validation and serialization

Framework

LangChain

Output parsers and chains

Inference

vLLM

Guided decoding for batch inference

Patterns

Marvin

Pydantic integration for structured outputs

08 — Further Reading

References

Documentation & Guides

Docs instructor docs — python.useinstructor.com ↗
Docs Outlines docs — outlines-dev.github.io/outlines ↗
Docs OpenAI structured outputs guide — platform.openai.com/docs ↗
Docs Guidance — github.com/guidance-ai/guidance ↗
Docs LMQL — lmql.ai ↗

LLM Output Control

The Problem with Free-Text Output

Native Structured Output Modes

OpenAI Strict Structured Output

Structured Output Methods Comparison

Instructor Library

Instructor Example

Constrained Decoding with Outlines

Outlines with Local Model

Outlines vs Native Structured Output

Grammar-Guided Generation and LMQL

llama.cpp Grammar for Structured Address

Classification and Choice Constraints

Choice Constraint for Classification

Production Patterns

✓ Validate at the boundary

⟲ Retry with error context

📋 Schema versioning

⚡ Graceful degradation

Tools & Libraries

References

Related concepts