APPLICATIONS & SYSTEMS

Structured Outputs

Extract typed, validated data from unstructured text using JSON schemas, constrained generation, and Pydantic validation.

schema → LLM → parsing the pattern
zero parsing errors constraint-based guarantee
Instructor & Outlines popular libraries
Contents
  1. The parsing problem
  2. Three approaches
  3. Instructor library
  4. Outlines framework
  5. Pydantic schemas
  6. Error handling & retry
  7. Production patterns
01 — Context

The Parsing Problem

LLMs generate text token-by-token, unconstrained. For structured extraction — entity recognition, classification, JSON generation — we need guaranteed outputs matching a schema. Naive approach: ask the LLM to return JSON, parse it, pray it's valid. This fails regularly.

Common Failures

Failure modeFrequencyCause
Invalid JSON (trailing comma, unquoted key)5–15%LLM doesn't enforce JSON spec
Missing required fields3–10%LLM forgets constraint
Wrong type (string instead of int)2–8%LLM guesses data type
Field name typo1–5%Variation in field names
Extra unexpected fields2–7%LLM adds context it thinks helpful

Naive fix: add "return JSON" to prompt. Better: constrain generation. Best: guarantee schema compliance at token level.

💡 Key insight: Structured output extraction is less about asking nicely, more about enforcing constraints at the generation level. Three approaches exist, with different tradeoffs.
02 — Solutions

Three Approaches to Structured Generation

Different techniques exist, ranging from simple to sophisticated. Choose based on robustness needs, latency budget, and what your model supports.

Approach 1: Native JSON Mode (Simple)

How: Some LLMs (Claude, GPT-4) support a "JSON mode" flag that constrains output to valid JSON. Pros: Simple, fast, one API parameter. Cons: Only validates JSON format, not schema adherence. Still need client-side validation. Best for: Quick prototypes, high-confidence tasks.

Approach 2: Post-Processing & Retry (Practical)

How: Generate response, validate against schema, retry if invalid. Use Instructor library for this. Pros: Works with any LLM, simple to implement, handles ambiguous schemas. Cons: Latency overhead (retries add 100–500ms), not guaranteed to succeed after N retries. Best for: Production systems where reliability matters and latency is acceptable.

Approach 3: Constrained Decoding (Guaranteed)

How: Use Outlines or LMQL to enforce constraints at the token level. At each step, only allow tokens that keep output schema-valid. Pros: Guaranteed valid output, no retries needed. Cons: Requires access to model logits, only works with certain inference engines (vLLM, Ollama). Best for: Self-hosted models, strict schema requirements.

ApproachLatencySchema guaranteeEase of useModel support
Native JSON modeFastFormat onlyEasiestClaude, GPT-4
Post-processing + retry (Instructor)MediumWith retriesEasyAll
Constrained decoding (Outlines)MediumGuaranteedMediumvLLM, local
Recommendation: Start with Instructor + JSON mode. Fast to implement, covers 90% of cases. Move to Outlines if you need hard schema guarantees.
03 — Instructor Library

Instructor: Structured Extraction at Scale

Instructor is a Python library (works with many LLM APIs) that wraps LLMs and enforces Pydantic schema compliance. Define your data type as a Pydantic model, call the LLM, get back validated objects. Simple and powerful.

Core Pattern

Step 1: Define Pydantic model (your schema). Step 2: Create instructor client. Step 3: Call LLM with response_model=YourModel. Step 4: Get back validated instance. Step 5: Retries happen transparently if validation fails.

from instructor import Instructor from pydantic import BaseModel, Field from anthropic import Anthropic # Define schema class Person(BaseModel): name: str = Field(description="Full name") age: int = Field(description="Age in years, or None if unknown") email: str | None = Field(description="Email address") role: str = Field(description="Job title or role") # Create client (wraps Anthropic) client = Instructor(Anthropic()) # Extract with guarantee of valid schema person = client.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=256, response_model=Person, messages=[{ "role": "user", "content": "Extract: John Doe, 28, john@company.com, Software Engineer" }] ) # person is a validated Person instance print(person.name) # "John Doe" print(person.age) # 28 print(person.role) # "Software Engineer"

Instructor Features

Automatic retry: If response fails validation, Instructor re-prompts with error message. Typical flow: 1 main call + 0–2 retries. Multiple models: Works with OpenAI, Anthropic, Cohere, local models. Async support: Full async/await support for concurrent extraction. Validation hooks: Use Pydantic field validators for custom logic.

⚠️ Cost of retries: Each retry is a full LLM call (tokens). For 100k extraction tasks with 5% failure rate, expect ~5k retries = 5k extra API calls. Budget accordingly or tune prompt to reduce failures.
04 — Constrained Decoding

Outlines: Token-Level Schema Enforcement

Outlines (formerly Guides) is a framework for constrained decoding. It modifies the LLM's generation to only produce tokens that maintain schema validity. Zero failures, by design.

How Outlines Works

Tokenization: Convert schema (JSON Schema, Pydantic, regex) to finite automaton. Generation: At each token, mask invalid options based on automaton state. LLM can only choose valid tokens. Output: Generated text is guaranteed valid schema.

Comparison with Instructor

AspectInstructorOutlines
Validation methodPost-generation, with retriesToken-level constraints
Success rate~95% (after retries)100% (by design)
Latency per callSingle + retriesSlightly slower (masking overhead)
Model supportAPI models (OpenAI, Anthropic)vLLM, Ollama (local models)
Ease of useVery simpleMedium (requires inference engine)
⚠️ Outlines limitation: Requires access to model logits and inference control. Only works with vLLM, Ollama, or custom inference servers. Can't use OpenAI or Anthropic APIs directly (they don't expose token masking).
05 — Schema Definition

Pydantic Schemas for Extraction

Whether using Instructor, Outlines, or native JSON mode, define schemas with Pydantic. It's the standard for Python-based LLM extraction.

Writing Good Extraction Schemas

1

Use descriptive fields — Clarity helps LLM

Field descriptions guide the LLM. Be explicit about what you want.

  • Bad: name: str
  • Good: name: str = Field(description="Person's full name (first and last)")
2

Use optional for ambiguous fields — Reduce failures

If a field might not exist, make it Optional. Reduces retry loops.

  • phone: str | None = Field(default=None)
  • Allows LLM to omit if not found
3

Use enums for categories — Constrain choices

For classification, use Enum instead of str.

  • from enum import Enum
  • class Sentiment(str, Enum): positive = "positive" ...
  • Outlines can guarantee exact enum values
4

Nest models for structure — Compose schemas

Complex extractions use nested models.

  • class Address(BaseModel): street, city, country
  • class Person(BaseModel): name, address: Address
from pydantic import BaseModel, Field from enum import Enum from typing import Optional class Sentiment(str, Enum): POSITIVE = "positive" NEGATIVE = "negative" NEUTRAL = "neutral" class ReviewExtraction(BaseModel): product_name: str = Field(description="Name of product reviewed") rating: int = Field(ge=1, le=5, description="1-5 star rating") sentiment: Sentiment = Field(description="Overall sentiment") summary: str = Field(description="2-3 sentence summary of review") reviewer_name: Optional[str] = Field(default=None, description="Name of reviewer, if given") # Instructor guarantees ReviewExtraction instance # All fields present, rating 1-5, sentiment in enum, no extra fields
Schema design rule: When in doubt, make it Optional. An Optional field that's None is better than an exception. LLM can always omit uncertain data.
06 — Robustness

Error Handling and Retry Strategies

Even with Instructor's automatic retries, some extractions fail. Plan for it.

Common Failure Patterns

LLM refusal: LLM declines to extract (safety filter). Handle gracefully. Ambiguous input: Text doesn't clearly contain requested data. Return Optional None instead of hallucinating. Timeout: Extraction takes >30s (stuck in retry loop). Implement max_retries. Malformed JSON: Rare with Instructor, but can happen with native JSON mode.

Best Practices

ScenarioSolution
Extraction fails after N retriesReturn partial result or None, log for review
Input is too short or ambiguousSet optional fields to None; don't retry
LLM refuses to extractCatch exception, fallback to manual/default
Want to debug failureLog full prompt, response, error; analyze
Batch extraction with high volumeSet max_retries=2, timeout per task, skip problematic items
💡 Retry strategy: 1–2 retries is usually enough. Beyond that, diminishing returns. If it fails twice, it's likely the input is genuinely ambiguous. Better to return None and flag for human review than waste tokens on retries.
# Production-grade extraction with retry and fallback # pip install instructor openai pydantic tenacity from pydantic import BaseModel, Field, field_validator from typing import Optional, List import instructor, openai from tenacity import retry, stop_after_attempt, wait_exponential client = instructor.from_openai(openai.OpenAI()) class LineItem(BaseModel): description: str quantity: int = Field(ge=1) unit_price: float = Field(ge=0) total: float @field_validator("total") @classmethod def validate_total(cls, v, info): expected = info.data.get("quantity", 1) * info.data.get("unit_price", 0) if abs(v - expected) > 0.01: raise ValueError(f"total {v} != quantity × unit_price {expected:.2f}") return v class Invoice(BaseModel): invoice_number: str vendor: str date: str line_items: List[LineItem] subtotal: float tax_rate: Optional[float] = 0.0 total_due: float @retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=1, max=4)) def extract_invoice(raw_text: str) -> Invoice: return client.chat.completions.create( model="gpt-4o-mini", response_model=Invoice, messages=[ {"role": "system", "content": "Extract invoice data precisely. Validate all totals."}, {"role": "user", "content": raw_text}, ], max_retries=2, # instructor-level retries for validation errors ) invoice = extract_invoice("Invoice #1042 from Acme Corp, 2024-03-15. 3x Widget A @ $12.50 = $37.50. Tax 8%. Total: $40.50") print(invoice.model_dump_json(indent=2))
07 — Scale & Production

Production Extraction Patterns

Structured extraction at scale requires monitoring, batching, and fallbacks.

Patterns for Production

Batch extraction: Queue incoming tasks, extract in batches (10–100). Cheaper and faster than one-by-one. Caching: Same input → same output. Cache results for duplicate texts. Model selection: Use fast models (Haiku) for simple schemas, larger models (Sonnet) for complex ones. Monitoring: Track success rate, latency, cost per extraction. Alert on drops. Fallback: If extraction fails, either return None, use rule-based fallback, or queue for human review.

Example: Production Extraction Pipeline

1. Receive document. 2. Check cache. 3. If miss, queue extraction. 4. Call Instructor with timeout=10s, max_retries=2. 5. If success, cache result, return. 6. If fail after retries, log + route to human review queue. 7. Monitor: track success%, avg latency, cost/task.

⚠️ Cost optimization: Structured extraction can be expensive at scale. 100k tasks × $0.001 per task = $100 base, plus retries. Optimize by: reducing schema complexity, using cheaper models where possible, caching aggressively.
08 — Further Reading

References and Related Concepts

Child Concepts
Related Concepts
Papers & Resources