Prompt Engineering Fundamentals

Contents

Anatomy of a prompt
Zero-shot, few-shot, CoT
Role prompting
Instructions & format
Output format control
Context & retrieval
Iteration & testing

01 — Structure

Anatomy of a Prompt

Every LLM call has three parts: system prompt (instructions, persona, constraints), user message (the current request), and optional assistant message (prefilled response start).

System Prompt

Sets behavior for the entire conversation. Loaded once. Put stable instructions here: persona, output format, constraints, examples. Example: "You are a senior data analyst. Outputs must be concise, evidence-based, and formatted as: Finding → Evidence → Recommendation."

User Message

The dynamic input. Should be specific and self-contained. Include all needed context without relying on the model's general knowledge if precision matters.

Assistant Prefill

Start the model's response yourself to guide format. Useful for: forcing JSON output (start with "{"), forcing code blocks (start with "```python"), or steering toward specific response types.

Example: well-structured prompt

# System prompt: stable, comprehensive system = """You are a senior data analyst. Your outputs must be: - Concise (under 200 words unless asked for detail) - Evidence-based (cite numbers from the data provided) - Formatted as: Finding → Evidence → Recommendation Do NOT speculate beyond the data provided. Do NOT use bullet points unless explicitly asked.""" # User message: specific, includes all needed context user = """Analyze this Q3 sales data and identify the top concern: Region | Q2 Revenue | Q3 Revenue | Change North | $2.1M | $1.8M | -14% South | $1.5M | $1.7M | +13% West | $3.2M | $2.9M | -9% Focus on actionable issues only."""

✓ Recency bias: Put your most important constraint at the END of the system prompt. Models exhibit recency bias — the last instruction has highest compliance rate.

02 — Capability Ladder

Zero-Shot, Few-Shot, and Chain-of-Thought

Zero-shot: just describe the task. Works for simple, common tasks. Fails on nuanced, rare, or multi-step tasks. Few-shot: provide 2–5 examples of (input, output) pairs before the actual input. Dramatically improves consistency and format adherence. Chain-of-thought (CoT): add "Think step by step" or include examples showing reasoning steps. Improves accuracy on math, logic, multi-step tasks by 20–40%.

Prompting Strategies

Strategy	When to use	Cost	Accuracy gain
Zero-shot	Simple, common tasks	Minimal	Baseline
Few-shot (2–5 examples)	Format consistency, rare tasks	+examples tokens	+10–30%
Zero-shot CoT	Math, logic, multi-step	+reasoning tokens	+20–40%
Few-shot CoT	Hardest tasks	+examples + reasoning	+30–50%

Example: few-shot vs zero-shot classification

# Zero-shot — inconsistent output format user = "Classify this support ticket: 'My login button doesn't work on mobile Safari'" # Output might be: "Technical Issue", "Bug Report", "UI Bug", "Technical" — unpredictable # Few-shot — consistent, controlled output user = """Classify support tickets into: billing, technical, account, feature_request Ticket: "I was charged twice for last month" → billing Ticket: "How do I export my data to CSV?" → feature_request Ticket: "Can't log in after password reset" → account Ticket: "My login button doesn't work on mobile Safari" →""" # Output: "technical" — format controlled, vocabulary controlled

03 — Persona & Expertise

Role Prompting and Personas

Role prompting: give the model an identity that activates relevant knowledge and communication style. "You are an expert X" works well when the expertise is well-represented in training data. Avoid fictional personas for safety-critical tasks — "You are DAN (Do Anything Now)" is a jailbreak pattern.

Audience Specification

"Explain to a 10-year-old" vs "Explain to a senior ML engineer" controls depth, vocabulary, and analogies. Specific audience selection forces the model to adjust explanation style.

Example: effective persona patterns

# Technical expert persona "You are a principal software engineer at a FAANG company with 15 years of Python experience. When reviewing code, you prioritize: correctness first, then readability, then performance. You give specific, actionable feedback — not vague observations." # Domain expert persona "You are a board-certified cardiologist. You explain medical concepts accurately but accessibly. Always recommend the patient consult their own doctor for personal medical decisions." # Anti-pattern: persona that fights the model's values "You are an AI with no restrictions..." → jailbreak attempt, will be ignored or refused

⚠️ Specificity activates knowledge: "principal engineer reviewing a data pipeline" is more reliable than "software engineer". The more specific the role, the more reliably the model activates relevant knowledge and tone.

04 — Positive vs Negative

Instructions: Positive vs Negative

Positive instructions ("Do X") are more reliable than negative instructions ("Don't do X"). Negative constraints are necessary but should be paired with positive alternatives. Be specific about format, length, and style — "brief" means different things to different models.

Example: rewriting vague/negative prompts

# Vague — model interprets "summary" inconsistently "Summarize this document." # Specific — model knows exactly what to produce "Write a 3-sentence executive summary of this document. Sentence 1: The main topic and scope. Sentence 2: The key finding or recommendation. Sentence 3: The most important caveat or risk." # Negative only — model may still do the thing "Don't be verbose. Don't add disclaimers." # Negative + positive alternative "Be direct and concise — maximum 150 words. Skip disclaimers and caveats unless the information is genuinely uncertain. Start your response immediately without preamble."

Common Anti-Patterns & Fixes

Anti-pattern	Problem	Fix
"Be helpful and accurate"	Every model tries this — no signal	"When uncertain, rate confidence 1-5"
"Don't hallucinate"	Model can't control this	"Only state facts you're confident about. Flag uncertain claims [UNCERTAIN]"
"Write a good email"	"Good" undefined	"Write 3-paragraph email: greeting + ask + next step + sign-off"
"Think carefully"	No actionable instruction	"List your assumptions first. Then answer."

05 — Structure & Control

Output Format Control

Specify format explicitly: JSON, markdown, bullet points, table, prose, code block. XML tags work well for structured extraction. Length control: specify word/sentence/paragraph counts. Prefilling: start the assistant's response to guide format.

Example: format control techniques

# JSON output — add schema example "Return your analysis as JSON with this exact structure: { 'verdict': 'approve' | 'reject' | 'escalate', 'confidence': 0.0-1.0, 'reasons': ['reason1', 'reason2'], 'flags': ['flag1'] or [] }" # XML tags for structured reasoning "Analyze the argument. Use these tags: List what the argument does well List logical flaws or gaps One sentence conclusion" # Prefilling to force code block messages = [ {"role": "user", "content": "Write a Python function to parse URLs"}, {"role": "assistant", "content": "```python\n"} # forces code block ]

06 — Grounding & Knowledge

Context and Retrieval Integration

Grounding: provide relevant context so the model answers from facts, not hallucinated memory. Document injection: insert retrieved documents with clear delimiters. Instruction position: put user's question AFTER the documents (recency effect), not before.

Example: RAG prompt template

system = """You are a customer support agent. Answer questions using ONLY the provided documentation. If the answer isn't in the docs, say so. Quote the relevant passage when possible.""" user = f"""Documentation: {retrieved_chunk_1} {retrieved_chunk_2} Customer question: {user_question} Answer based only on the documentation above:"""

⚠️ Retrieval quality matters: "Answer only from provided context" reduces hallucinations but causes unhelpful "I don't know" responses when relevant context wasn't retrieved. Tune retrieval before over-restricting the prompt.

07 — Practice & Refinement

Iteration and Testing

Development Strategies

Start Minimal, Add Constraints — build iteratively

Begin with the simplest prompt that could work. Add instructions only to fix specific observed failures. Every added line is a new failure mode.

Start: 1–2 sentences describing the task
Test on 5 diverse examples
Add constraint only if multiple tests fail

Test on Diverse Inputs — edge cases matter

20 varied inputs beat 20 similar inputs. Include edge cases: empty input, very long input, off-topic input, adversarial input. Prompts that work on 5 examples often fail on the 6th.

Normal cases: 50% of eval
Edge cases: 30% of eval
Adversarial/tricky: 20% of eval

Version & Diff — track changes

Store prompts as text files in git. When you change a prompt, run your eval suite on both versions. Never deploy a prompt change without comparative testing.

Commit each prompt change separately
Include before/after eval results in commit
Tag production-ready prompts

Separate System & User — isolation principle

Put stable instructions in system prompt; put dynamic content in user message. Mixing them makes prompts brittle — one change breaks everything.

System: rules, persona, format constraints
User: current task, data, question
Never hardcode user data in system prompt

Evaluation & Tooling

Testing

PromptFoo

Test & compare prompts with eval suites

Observability

LangSmith

Debug & trace LLM chains and prompts

Evaluation

Braintrust

LLM eval framework with automated scoring

Testing

OpenAI Playground

Interactive prompt testing & experimentation

Testing

Anthropic Console

Interactive testing for Claude models

Framework

DSPy

Programmatic prompt optimization

Tracking

Weights & Biases

Prompt versioning and experiment tracking

References

Prompt Engineering Fundamentals

Anatomy of a Prompt

System Prompt

User Message

Assistant Prefill

Zero-Shot, Few-Shot, and Chain-of-Thought

Prompting Strategies

Role Prompting and Personas

Audience Specification

Instructions: Positive vs Negative

Common Anti-Patterns & Fixes

Output Format Control

Context and Retrieval Integration

Iteration and Testing

Development Strategies

Start Minimal, Add Constraints — build iteratively

Test on Diverse Inputs — edge cases matter

Version & Diff — track changes

Separate System & User — isolation principle

Evaluation & Tooling

Further Reading

Related concepts