Structured Outputs

Contents

The parsing problem
Three approaches
Instructor library
Outlines framework
Pydantic schemas
Error handling & retry
Production patterns

01 — Context

The Parsing Problem

LLMs generate text token-by-token, unconstrained. For structured extraction — entity recognition, classification, JSON generation — we need guaranteed outputs matching a schema. Naive approach: ask the LLM to return JSON, parse it, pray it's valid. This fails regularly.

Common Failures

Failure mode	Frequency	Cause
Invalid JSON (trailing comma, unquoted key)	5–15%	LLM doesn't enforce JSON spec
Missing required fields	3–10%	LLM forgets constraint
Wrong type (string instead of int)	2–8%	LLM guesses data type
Field name typo	1–5%	Variation in field names
Extra unexpected fields	2–7%	LLM adds context it thinks helpful

Naive fix: add "return JSON" to prompt. Better: constrain generation. Best: guarantee schema compliance at token level.

💡 Key insight: Structured output extraction is less about asking nicely, more about enforcing constraints at the generation level. Three approaches exist, with different tradeoffs.

02 — Solutions

Three Approaches to Structured Generation

Different techniques exist, ranging from simple to sophisticated. Choose based on robustness needs, latency budget, and what your model supports.

Approach 1: Native JSON Mode (Simple)

How: Some LLMs (Claude, GPT-4) support a "JSON mode" flag that constrains output to valid JSON. Pros: Simple, fast, one API parameter. Cons: Only validates JSON format, not schema adherence. Still need client-side validation. Best for: Quick prototypes, high-confidence tasks.

Approach 2: Post-Processing & Retry (Practical)

How: Generate response, validate against schema, retry if invalid. Use Instructor library for this. Pros: Works with any LLM, simple to implement, handles ambiguous schemas. Cons: Latency overhead (retries add 100–500ms), not guaranteed to succeed after N retries. Best for: Production systems where reliability matters and latency is acceptable.

Approach 3: Constrained Decoding (Guaranteed)

How: Use Outlines or LMQL to enforce constraints at the token level. At each step, only allow tokens that keep output schema-valid. Pros: Guaranteed valid output, no retries needed. Cons: Requires access to model logits, only works with certain inference engines (vLLM, Ollama). Best for: Self-hosted models, strict schema requirements.

Approach	Latency	Schema guarantee	Ease of use	Model support
Native JSON mode	Fast	Format only	Easiest	Claude, GPT-4
Post-processing + retry (Instructor)	Medium	With retries	Easy	All
Constrained decoding (Outlines)	Medium	Guaranteed	Medium	vLLM, local

✓ Recommendation: Start with Instructor + JSON mode. Fast to implement, covers 90% of cases. Move to Outlines if you need hard schema guarantees.

03 — Instructor Library

Instructor: Structured Extraction at Scale

Instructor is a Python library (works with many LLM APIs) that wraps LLMs and enforces Pydantic schema compliance. Define your data type as a Pydantic model, call the LLM, get back validated objects. Simple and powerful.

Core Pattern

Step 1: Define Pydantic model (your schema). Step 2: Create instructor client. Step 3: Call LLM with response_model=YourModel. Step 4: Get back validated instance. Step 5: Retries happen transparently if validation fails.

from instructor import Instructor from pydantic import BaseModel, Field from anthropic import Anthropic # Define schema class Person(BaseModel): name: str = Field(description="Full name") age: int = Field(description="Age in years, or None if unknown") email: str | None = Field(description="Email address") role: str = Field(description="Job title or role") # Create client (wraps Anthropic) client = Instructor(Anthropic()) # Extract with guarantee of valid schema person = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=256, response_model=Person, messages=[{ "role": "user", "content": "Extract: John Doe, 28, john@company.com, Software Engineer" }] ) # person is a validated Person instance print(person.name) # "John Doe" print(person.age) # 28 print(person.role) # "Software Engineer"

Instructor Features

Automatic retry: If response fails validation, Instructor re-prompts with error message. Typical flow: 1 main call + 0–2 retries. Multiple models: Works with OpenAI, Anthropic, Cohere, local models. Async support: Full async/await support for concurrent extraction. Validation hooks: Use Pydantic field validators for custom logic.

⚠️ Cost of retries: Each retry is a full LLM call (tokens). For 100k extraction tasks with 5% failure rate, expect ~5k retries = 5k extra API calls. Budget accordingly or tune prompt to reduce failures.

04 — Constrained Decoding

Outlines: Token-Level Schema Enforcement

Outlines (formerly Guides) is a framework for constrained decoding. It modifies the LLM's generation to only produce tokens that maintain schema validity. Zero failures, by design.

How Outlines Works

Tokenization: Convert schema (JSON Schema, Pydantic, regex) to finite automaton. Generation: At each token, mask invalid options based on automaton state. LLM can only choose valid tokens. Output: Generated text is guaranteed valid schema.

Comparison with Instructor

Aspect	Instructor	Outlines
Validation method	Post-generation, with retries	Token-level constraints
Success rate	~95% (after retries)	100% (by design)
Latency per call	Single + retries	Slightly slower (masking overhead)
Model support	API models (OpenAI, Anthropic)	vLLM, Ollama (local models)
Ease of use	Very simple	Medium (requires inference engine)

⚠️ Outlines limitation: Requires access to model logits and inference control. Only works with vLLM, Ollama, or custom inference servers. Can't use OpenAI or Anthropic APIs directly (they don't expose token masking).

05 — Schema Definition

Pydantic Schemas for Extraction

Whether using Instructor, Outlines, or native JSON mode, define schemas with Pydantic. It's the standard for Python-based LLM extraction.

Writing Good Extraction Schemas

Use descriptive fields — Clarity helps LLM

Field descriptions guide the LLM. Be explicit about what you want.

Bad: name: str
Good: name: str = Field(description="Person's full name (first and last)")

Use optional for ambiguous fields — Reduce failures

If a field might not exist, make it Optional. Reduces retry loops.

phone: str | None = Field(default=None)
Allows LLM to omit if not found

Use enums for categories — Constrain choices

For classification, use Enum instead of str.

from enum import Enum
class Sentiment(str, Enum): positive = "positive" ...
Outlines can guarantee exact enum values

Nest models for structure — Compose schemas

Complex extractions use nested models.

class Address(BaseModel): street, city, country
class Person(BaseModel): name, address: Address

from pydantic import BaseModel, Field from enum import Enum from typing import Optional class Sentiment(str, Enum): POSITIVE = "positive" NEGATIVE = "negative" NEUTRAL = "neutral" class ReviewExtraction(BaseModel): product_name: str = Field(description="Name of product reviewed") rating: int = Field(ge=1, le=5, description="1-5 star rating") sentiment: Sentiment = Field(description="Overall sentiment") summary: str = Field(description="2-3 sentence summary of review") reviewer_name: Optional[str] = Field(default=None, description="Name of reviewer, if given") # Instructor guarantees ReviewExtraction instance # All fields present, rating 1-5, sentiment in enum, no extra fields

✓ Schema design rule: When in doubt, make it Optional. An Optional field that's None is better than an exception. LLM can always omit uncertain data.

06 — Robustness

Error Handling and Retry Strategies

Even with Instructor's automatic retries, some extractions fail. Plan for it.

Common Failure Patterns

LLM refusal: LLM declines to extract (safety filter). Handle gracefully. Ambiguous input: Text doesn't clearly contain requested data. Return Optional None instead of hallucinating. Timeout: Extraction takes >30s (stuck in retry loop). Implement max_retries. Malformed JSON: Rare with Instructor, but can happen with native JSON mode.

Best Practices

Scenario	Solution
Extraction fails after N retries	Return partial result or None, log for review
Input is too short or ambiguous	Set optional fields to None; don't retry
LLM refuses to extract	Catch exception, fallback to manual/default
Want to debug failure	Log full prompt, response, error; analyze
Batch extraction with high volume	Set max_retries=2, timeout per task, skip problematic items

💡 Retry strategy: 1–2 retries is usually enough. Beyond that, diminishing returns. If it fails twice, it's likely the input is genuinely ambiguous. Better to return None and flag for human review than waste tokens on retries.

# Production-grade extraction with retry and fallback # pip install instructor openai pydantic tenacity from pydantic import BaseModel, Field, field_validator from typing import Optional, List import instructor, openai from tenacity import retry, stop_after_attempt, wait_exponential client = instructor.from_openai(openai.OpenAI()) class LineItem(BaseModel): description: str quantity: int = Field(ge=1) unit_price: float = Field(ge=0) total: float @field_validator("total") @classmethod def validate_total(cls, v, info): expected = info.data.get("quantity", 1) * info.data.get("unit_price", 0) if abs(v - expected) > 0.01: raise ValueError(f"total {v} != quantity × unit_price {expected:.2f}") return v class Invoice(BaseModel): invoice_number: str vendor: str date: str line_items: List[LineItem] subtotal: float tax_rate: Optional[float] = 0.0 total_due: float @retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=1, max=4)) def extract_invoice(raw_text: str) -> Invoice: return client.chat.completions.create( model="gpt-4o-mini", response_model=Invoice, messages=[ {"role": "system", "content": "Extract invoice data precisely. Validate all totals."}, {"role": "user", "content": raw_text}, ], max_retries=2, # instructor-level retries for validation errors ) invoice = extract_invoice("Invoice #1042 from Acme Corp, 2024-03-15. 3x Widget A @ $12.50 = $37.50. Tax 8%. Total: $40.50") print(invoice.model_dump_json(indent=2))

07 — Scale & Production

Production Extraction Patterns

Structured extraction at scale requires monitoring, batching, and fallbacks.

Patterns for Production

Batch extraction: Queue incoming tasks, extract in batches (10–100). Cheaper and faster than one-by-one. Caching: Same input → same output. Cache results for duplicate texts. Model selection: Use fast models (Haiku) for simple schemas, larger models (Sonnet) for complex ones. Monitoring: Track success rate, latency, cost per extraction. Alert on drops. Fallback: If extraction fails, either return None, use rule-based fallback, or queue for human review.

Example: Production Extraction Pipeline

1. Receive document. 2. Check cache. 3. If miss, queue extraction. 4. Call Instructor with timeout=10s, max_retries=2. 5. If success, cache result, return. 6. If fail after retries, log + route to human review queue. 7. Monitor: track success%, avg latency, cost/task.

⚠️ Cost optimization: Structured extraction can be expensive at scale. 100k tasks × $0.001 per task = $100 base, plus retries. Optimize by: reducing schema complexity, using cheaper models where possible, caching aggressively.

08 — Further Reading

References and Related Concepts

Child Concepts

Instructor — Python library for structured extraction
Outlines — Constrained decoding framework
Marvin — High-level structured extraction library

Related Concepts

Output Control — Controlling LLM generation format
Function Calling — Tool use and structured generation
Programmatic Prompting — DSLs for LLM control

Papers & Resources

Docs Instructor GitHub ↗ — Open source Python library
Docs Outlines Documentation ↗ — Constrained generation framework
Blog Pydantic v2 Guide — Schema validation and field descriptions

Structured Outputs

The Parsing Problem

Common Failures

Three Approaches to Structured Generation

Approach 1: Native JSON Mode (Simple)

Approach 2: Post-Processing & Retry (Practical)

Approach 3: Constrained Decoding (Guaranteed)

Instructor: Structured Extraction at Scale

Core Pattern

Instructor Features

Outlines: Token-Level Schema Enforcement

How Outlines Works

Comparison with Instructor

Pydantic Schemas for Extraction

Writing Good Extraction Schemas

Use descriptive fields — Clarity helps LLM

Use optional for ambiguous fields — Reduce failures

Use enums for categories — Constrain choices

Nest models for structure — Compose schemas

Error Handling and Retry Strategies

Common Failure Patterns

Best Practices

Production Extraction Patterns

Patterns for Production

Example: Production Extraction Pipeline

References and Related Concepts

Related concepts