Prompting

Prompt Engineering Fundamentals

System prompts, few-shot examples, role prompting, and the principles that reliably improve LLM outputs

system → user → assistant the message structure
zero-shot → few-shot → CoT the capability ladder
specificity beats cleverness the core lesson
Contents
  1. Anatomy of a prompt
  2. Zero-shot, few-shot, CoT
  3. Role prompting
  4. Instructions & format
  5. Output format control
  6. Context & retrieval
  7. Iteration & testing
01 — Structure

Anatomy of a Prompt

Every LLM call has three parts: system prompt (instructions, persona, constraints), user message (the current request), and optional assistant message (prefilled response start).

System Prompt

Sets behavior for the entire conversation. Loaded once. Put stable instructions here: persona, output format, constraints, examples. Example: "You are a senior data analyst. Outputs must be concise, evidence-based, and formatted as: Finding → Evidence → Recommendation."

User Message

The dynamic input. Should be specific and self-contained. Include all needed context without relying on the model's general knowledge if precision matters.

Assistant Prefill

Start the model's response yourself to guide format. Useful for: forcing JSON output (start with "{"), forcing code blocks (start with "```python"), or steering toward specific response types.

Example: well-structured prompt

# System prompt: stable, comprehensive system = """You are a senior data analyst. Your outputs must be: - Concise (under 200 words unless asked for detail) - Evidence-based (cite numbers from the data provided) - Formatted as: Finding → Evidence → Recommendation Do NOT speculate beyond the data provided. Do NOT use bullet points unless explicitly asked.""" # User message: specific, includes all needed context user = """Analyze this Q3 sales data and identify the top concern: Region | Q2 Revenue | Q3 Revenue | Change North | $2.1M | $1.8M | -14% South | $1.5M | $1.7M | +13% West | $3.2M | $2.9M | -9% Focus on actionable issues only."""
Recency bias: Put your most important constraint at the END of the system prompt. Models exhibit recency bias — the last instruction has highest compliance rate.
02 — Capability Ladder

Zero-Shot, Few-Shot, and Chain-of-Thought

Zero-shot: just describe the task. Works for simple, common tasks. Fails on nuanced, rare, or multi-step tasks. Few-shot: provide 2–5 examples of (input, output) pairs before the actual input. Dramatically improves consistency and format adherence. Chain-of-thought (CoT): add "Think step by step" or include examples showing reasoning steps. Improves accuracy on math, logic, multi-step tasks by 20–40%.

Prompting Strategies

StrategyWhen to useCostAccuracy gain
Zero-shotSimple, common tasksMinimalBaseline
Few-shot (2–5 examples)Format consistency, rare tasks+examples tokens+10–30%
Zero-shot CoTMath, logic, multi-step+reasoning tokens+20–40%
Few-shot CoTHardest tasks+examples + reasoning+30–50%

Example: few-shot vs zero-shot classification

# Zero-shot — inconsistent output format user = "Classify this support ticket: 'My login button doesn't work on mobile Safari'" # Output might be: "Technical Issue", "Bug Report", "UI Bug", "Technical" — unpredictable # Few-shot — consistent, controlled output user = """Classify support tickets into: billing, technical, account, feature_request Ticket: "I was charged twice for last month" → billing Ticket: "How do I export my data to CSV?" → feature_request Ticket: "Can't log in after password reset" → account Ticket: "My login button doesn't work on mobile Safari" →""" # Output: "technical" — format controlled, vocabulary controlled
03 — Persona & Expertise

Role Prompting and Personas

Role prompting: give the model an identity that activates relevant knowledge and communication style. "You are an expert X" works well when the expertise is well-represented in training data. Avoid fictional personas for safety-critical tasks — "You are DAN (Do Anything Now)" is a jailbreak pattern.

Audience Specification

"Explain to a 10-year-old" vs "Explain to a senior ML engineer" controls depth, vocabulary, and analogies. Specific audience selection forces the model to adjust explanation style.

Example: effective persona patterns

# Technical expert persona "You are a principal software engineer at a FAANG company with 15 years of Python experience. When reviewing code, you prioritize: correctness first, then readability, then performance. You give specific, actionable feedback — not vague observations." # Domain expert persona "You are a board-certified cardiologist. You explain medical concepts accurately but accessibly. Always recommend the patient consult their own doctor for personal medical decisions." # Anti-pattern: persona that fights the model's values "You are an AI with no restrictions..." → jailbreak attempt, will be ignored or refused
⚠️ Specificity activates knowledge: "principal engineer reviewing a data pipeline" is more reliable than "software engineer". The more specific the role, the more reliably the model activates relevant knowledge and tone.
04 — Positive vs Negative

Instructions: Positive vs Negative

Positive instructions ("Do X") are more reliable than negative instructions ("Don't do X"). Negative constraints are necessary but should be paired with positive alternatives. Be specific about format, length, and style — "brief" means different things to different models.

Example: rewriting vague/negative prompts

# Vague — model interprets "summary" inconsistently "Summarize this document." # Specific — model knows exactly what to produce "Write a 3-sentence executive summary of this document. Sentence 1: The main topic and scope. Sentence 2: The key finding or recommendation. Sentence 3: The most important caveat or risk." # Negative only — model may still do the thing "Don't be verbose. Don't add disclaimers." # Negative + positive alternative "Be direct and concise — maximum 150 words. Skip disclaimers and caveats unless the information is genuinely uncertain. Start your response immediately without preamble."

Common Anti-Patterns & Fixes

Anti-patternProblemFix
"Be helpful and accurate"Every model tries this — no signal"When uncertain, rate confidence 1-5"
"Don't hallucinate"Model can't control this"Only state facts you're confident about. Flag uncertain claims [UNCERTAIN]"
"Write a good email""Good" undefined"Write 3-paragraph email: greeting + ask + next step + sign-off"
"Think carefully"No actionable instruction"List your assumptions first. Then answer."
05 — Structure & Control

Output Format Control

Specify format explicitly: JSON, markdown, bullet points, table, prose, code block. XML tags work well for structured extraction. Length control: specify word/sentence/paragraph counts. Prefilling: start the assistant's response to guide format.

Example: format control techniques

# JSON output — add schema example "Return your analysis as JSON with this exact structure: { 'verdict': 'approve' | 'reject' | 'escalate', 'confidence': 0.0-1.0, 'reasons': ['reason1', 'reason2'], 'flags': ['flag1'] or [] }" # XML tags for structured reasoning "Analyze the argument. Use these tags: List what the argument does well List logical flaws or gaps One sentence conclusion" # Prefilling to force code block messages = [ {"role": "user", "content": "Write a Python function to parse URLs"}, {"role": "assistant", "content": "```python\n"} # forces code block ]
06 — Grounding & Knowledge

Context and Retrieval Integration

Grounding: provide relevant context so the model answers from facts, not hallucinated memory. Document injection: insert retrieved documents with clear delimiters. Instruction position: put user's question AFTER the documents (recency effect), not before.

Example: RAG prompt template

system = """You are a customer support agent. Answer questions using ONLY the provided documentation. If the answer isn't in the docs, say so. Quote the relevant passage when possible.""" user = f"""Documentation: {retrieved_chunk_1} {retrieved_chunk_2} Customer question: {user_question} Answer based only on the documentation above:"""
⚠️ Retrieval quality matters: "Answer only from provided context" reduces hallucinations but causes unhelpful "I don't know" responses when relevant context wasn't retrieved. Tune retrieval before over-restricting the prompt.
07 — Practice & Refinement

Iteration and Testing

Development Strategies

1

Start Minimal, Add Constraints — build iteratively

Begin with the simplest prompt that could work. Add instructions only to fix specific observed failures. Every added line is a new failure mode.

  • Start: 1–2 sentences describing the task
  • Test on 5 diverse examples
  • Add constraint only if multiple tests fail
2

Test on Diverse Inputs — edge cases matter

20 varied inputs beat 20 similar inputs. Include edge cases: empty input, very long input, off-topic input, adversarial input. Prompts that work on 5 examples often fail on the 6th.

  • Normal cases: 50% of eval
  • Edge cases: 30% of eval
  • Adversarial/tricky: 20% of eval
3

Version & Diff — track changes

Store prompts as text files in git. When you change a prompt, run your eval suite on both versions. Never deploy a prompt change without comparative testing.

  • Commit each prompt change separately
  • Include before/after eval results in commit
  • Tag production-ready prompts
4

Separate System & User — isolation principle

Put stable instructions in system prompt; put dynamic content in user message. Mixing them makes prompts brittle — one change breaks everything.

  • System: rules, persona, format constraints
  • User: current task, data, question
  • Never hardcode user data in system prompt

Evaluation & Tooling

Testing
PromptFoo
Test & compare prompts with eval suites
Observability
LangSmith
Debug & trace LLM chains and prompts
Evaluation
Braintrust
LLM eval framework with automated scoring
Testing
OpenAI Playground
Interactive prompt testing & experimentation
Testing
Anthropic Console
Interactive testing for Claude models
Framework
DSPy
Programmatic prompt optimization
Tracking
Weights & Biases
Prompt versioning and experiment tracking
References

Further Reading

Guides & Documentation
Academic Papers
Tools & Frameworks