Programmatic Prompting

Contents

Beyond manual prompting
Prompt templates
DSPy: compile prompts
DSPy optimizers
LangChain LCEL
Automatic optimization
When to use each

01 — The Problem

Beyond Manual Prompt Engineering

Manual prompting: write a prompt, test on examples, tweak wording, repeat. Brittle — small wording changes cause large quality swings. Doesn't scale.

Programmatic prompting: define what you want (task signature, metric), let a framework optimize how to get it (prompt wording, few-shot examples, chain structure)

Key insight: prompts are hyperparameters. They should be optimized on training data, not tuned by intuition.

⚠️ Manual prompt engineering reaches a ceiling quickly. For tasks with >100 labeled examples, programmatic optimization (DSPy, OPRO, APE) consistently outperforms hand-tuned prompts.

02 — Foundation

Prompt Templates and Jinja2

Template engines: parameterize prompts so dynamic content is cleanly separated from instructions.

Jinja2: Python's standard template engine. Supports conditionals, loops, filters — useful for complex prompt construction.

LangChain PromptTemplate: wraps Jinja2 with LLM-specific tooling (message formatting, partial templates, composition)

Example: Jinja2 + LangChain Prompt Template

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder # System template with conditional sections system_template = """You are a {role} assistant. {% if context %} Use this context to answer questions: {{ context }} {% endif %} {% if output_format == "json" %} Always respond in valid JSON. {% endif %} """ prompt = ChatPromptTemplate.from_messages([ ("system", system_template), MessagesPlaceholder("history"), # dynamic conversation history ("human", "{question}") ]) # Compose with LLM from langchain_openai import ChatOpenAI chain = prompt | ChatOpenAI(model="gpt-4o") | StrOutputParser() result = chain.invoke({ "role": "financial analyst", "context": retrieved_docs, "output_format": "json", "history": conversation_history, "question": "What was Q3 revenue growth?" })

03 — Framework

DSPy: Compile Your Prompts

DSPy (Declarative Self-improving Python): define your task as a typed Signature, compose Modules (Predict, ChainOfThought, ReAct), then compile with an Optimizer that finds the best prompts + few-shot examples automatically.

No manual prompt strings in your code. The optimizer writes the prompts.

Signatures: declare inputs, outputs, and docstring description of the task. DSPy infers the prompt.

Example: DSPy Classification Pipeline

import dspy lm = dspy.LM("openai/gpt-4o-mini") dspy.configure(lm=lm) # 1. Define task signature class SentimentClassifier(dspy.Signature): """Classify the sentiment of customer feedback.""" feedback: str = dspy.InputField() sentiment: Literal["positive", "negative", "neutral"] = dspy.OutputField() confidence: float = dspy.OutputField(desc="0.0 to 1.0") # 2. Build module classifier = dspy.Predict(SentimentClassifier) # 3. Compile with optimizer (finds best few-shot examples) optimizer = dspy.BootstrapFewShot(metric=accuracy_metric, max_bootstrapped_demos=4) compiled = optimizer.compile(classifier, trainset=train_examples) # 4. Use result = compiled(feedback="The delivery was fast but packaging was damaged") print(result.sentiment, result.confidence)

DSPy Modules

Module	What it does	When to use
Predict	Single LLM call with signature	Classification, extraction
ChainOfThought	Adds reasoning field	Math, logic, analysis
ReAct	Tool-use + reasoning loop	Agents, multi-step tasks
MultiChainComparison	Multiple chains, pick best	High-stakes decisions
Retrieve	RAG retrieval step	Any RAG pipeline

04 — Optimization

DSPy Optimizers

BootstrapFewShot: runs your program on training examples, identifies successful traces, uses them as few-shot examples — automatic few-shot selection.

MIPRO (v2): optimizes both instructions AND few-shot examples simultaneously using Bayesian search over prompt candidates.

BootstrapFinetune: instead of in-context few-shot, fine-tunes the model weights on bootstrapped traces.

Example: MIPRO Optimization

from dspy.teleprompt import MIPROv2 optimizer = MIPROv2( metric=my_metric, auto="medium", # "light" / "medium" / "heavy" — controls search budget num_threads=8 ) compiled_program = optimizer.compile( my_program, trainset=train_data, # labeled examples for optimization valset=val_data, # held-out for optimizer eval requires_permission_to_run=False ) # compiled_program has optimized prompts + few-shot examples # Typically 10-30% better than manually written prompts

✓ MIPROv2 with auto="medium" is the current recommended default for most DSPy programs. It takes 30–60 minutes but finds prompts that consistently outperform manual engineering.

05 — Composition

LangChain Expression Language (LCEL)

LCEL: pipe-based composition of LangChain components. Chain = prompt | model | parser.

Supports: streaming, async, parallel branches, fallbacks, retries — all composable

Example: Parallel Chain with LCEL

from langchain_core.runnables import RunnableParallel, RunnablePassthrough from langchain_openai import ChatOpenAI llm = ChatOpenAI(model="gpt-4o-mini") # Two parallel analysis paths parallel_chain = RunnableParallel( summary=summary_prompt | llm | StrOutputParser(), risks=risks_prompt | llm | StrOutputParser(), original=RunnablePassthrough() ) # Full pipeline: retrieve → analyze in parallel → synthesize full_chain = ( {"context": retriever, "question": RunnablePassthrough()} | parallel_chain | synthesis_prompt | llm | StrOutputParser() ) result = full_chain.invoke("What are the key risks in Q3 earnings?") # result contains summary, risks, and synthesis in one call

06 — Advanced

Automatic Prompt Optimization (APE, OPRO)

APE (Automatic Prompt Engineer): generate candidate instruction paraphrases using LLM, evaluate each on dev set, select best-performing instruction.

OPRO (Optimization by Prompting): frame prompt optimization as a meta-prompt problem — feed current prompt + performance scores to LLM, ask it to suggest improvements, iterate.

Example: Simple OPRO Loop

def opro_optimize(task_desc: str, examples: list, metric, iterations=10): current_prompt = task_desc history = [] for i in range(iterations): score = evaluate(current_prompt, examples, metric) history.append({"prompt": current_prompt, "score": score}) # Ask LLM to improve the prompt meta_prompt = f""" You are optimizing an LLM prompt. Here are previous attempts and their scores: {history[-5:]} # last 5 attempts The task: {task_desc} Suggest a better prompt that might score higher. Output ONLY the new prompt.""" current_prompt = llm.invoke(meta_prompt) return max(history, key=lambda x: x["score"])["prompt"]

⚠️ OPRO and APE require labeled evaluation data. The quality of your metric function directly caps the quality of the optimized prompt. Garbage metric → garbage prompt.

07 — Decision Guide

When to Use Each Approach

Manual Prompting First — the baseline

Always start here. If you can solve the task with a well-written prompt and <50 examples, you don't need programmatic optimization. Spend time on your evaluation metric instead.

DSPy When You Have Labeled Data — the standard

If you have 100+ labeled (input, output) examples and a clear metric, DSPy optimization will outperform manual prompting. Start with BootstrapFewShot.

LCEL for Complex Chains — the pattern

When your pipeline has multiple LLM calls, parallel branches, retrieval, or conditional routing, LCEL's composability and streaming support pay off.

Fine-tuning as the Final Step — the ultimate

When programmatic prompting plateaus, use DSPy's BootstrapFinetune or standard SFT to bake the optimized behavior into model weights.

Tools Grid

Framework

DSPy

Declarative signatures, automatic compilation, optimization

Composition

LangChain (LCEL)

Pipe-based composition, streaming, async

Orchestration

LangGraph

Stateful chains, conditional routing, loops

Output Structure

instructor

Pydantic validation, structured outputs

Testing

PromptFoo

Prompt testing, comparison, CI/CD

Monitoring

Braintrust

Evaluation, monitoring, dataset management

Optimization

W&B Prompts

Prompt versioning, experimentation

IDE

OpenAI Playground

Quick prototyping and testing

References

Academic Papers

Paper Khattab, O. et al. (2023). DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines. arXiv:2310.03714 — arxiv:2310.03714 ↗
Paper Yang, G. et al. (2023). Large Language Models as Optimizers. OPRO paper. arXiv:2309.03409 — arxiv:2309.03409 ↗
Paper Zhou, Y. et al. (2022). Large Language Models Are Human-Level Prompt Engineers. APE paper. arXiv:2211.01910 — arxiv:2211.01910 ↗

Documentation

Docs DSPy Documentation — dspy.ai ↗
Docs LangChain LCEL Docs — python.langchain.com/docs/expression_language ↗

Practitioner Resources

Blog Khattab, O. (2023). DSPy: Compiling Language Models into Self-Improving Pipelines. Intro and walkthrough — dspy.ai/blog ↗

Programmatic Prompting

Beyond Manual Prompt Engineering

Prompt Templates and Jinja2

Example: Jinja2 + LangChain Prompt Template

DSPy: Compile Your Prompts

Example: DSPy Classification Pipeline

DSPy Modules

DSPy Optimizers

Example: MIPRO Optimization

LangChain Expression Language (LCEL)

Example: Parallel Chain with LCEL

Automatic Prompt Optimization (APE, OPRO)

Example: Simple OPRO Loop

When to Use Each Approach

Manual Prompting First — the baseline

DSPy When You Have Labeled Data — the standard

LCEL for Complex Chains — the pattern

Fine-tuning as the Final Step — the ultimate

Tools Grid

References

Related concepts