Basic Prompting

Zero-Shot Prompting

Ask the model to do something without showing it any examples — just a clear, specific instruction. Deceptively powerful when written well.

0
Examples needed
Specificity
Key lever
Best first try
Start here

Table of Contents

SECTION 01

What zero-shot actually means

Imagine hiring a contractor you've never worked with before. You don't show them a finished kitchen to copy — you just tell them exactly what you want. Zero-shot prompting is that: give the model a task description, no examples, and let it draw on everything it learned during training.

The "shot" comes from ML terminology where a "shot" means a training example. Zero-shot = zero examples provided at prompt time. The model already knows how to do an enormous range of tasks — you just need to ask clearly enough.

When to use it: Start every new task with zero-shot. It's free, fast, and surprisingly good. Only move to few-shot or fine-tuning if zero-shot measurably falls short.
SECTION 02

Why it works

Modern LLMs are trained on trillions of tokens covering almost every task humans write about — summaries, translations, code reviews, recipes, legal memos. When you write "Summarise this email in 3 bullet points," the model has seen thousands of examples of exactly that task during training. You're not teaching it; you're activating what it already knows.

Zero-shot works because:

Zero-shot breaks down for novel combinations the model hasn't seen — unusual output formats, highly domain-specific jargon, or tasks that require multi-step reasoning chains. Those need few-shot or chain-of-thought.

SECTION 03

The specificity rule

The number one variable in zero-shot quality is how specific your instruction is. Vague instructions produce vague outputs.

āŒ Weak: "Summarise this text." āœ“ Better: "Summarise this in 3 bullet points, each under 15 words, focusing only on decisions made, not background context." āŒ Weak: "Write a product description." āœ“ Better: "Write a 2-sentence product description for a B2B SaaS audience. Lead with the outcome (what it saves/improves), then the mechanism. Avoid adjectives like 'powerful' or 'seamless'."

Four dimensions to specify:

SECTION 04

Template anatomy

# Anatomy of a strong zero-shot prompt [ROLE / CONTEXT - optional but often helpful] You are a technical writer reviewing Python code. [TASK - be specific] Review the following function and identify: 1. Any bugs or edge cases the author missed 2. One concrete improvement to make it more Pythonic [INPUT] {{code}} [OUTPUT FORMAT] Respond in this exact structure: - Bugs found: (list each or "None") - Improvement: (one sentence with example) [CONSTRAINTS - what to avoid] Do not rewrite the entire function. Focus only on the two points above.
The double-brace trick: Use {{variable}} as a placeholder when building prompt templates in code — it makes the variable slot visible at a glance and works naturally with Python's .format() and f-string alternatives.
SECTION 05

When zero-shot breaks down

SymptomLikely causeFix
Wrong format (e.g. prose instead of JSON)Format not specifiedAdd explicit output format with an example skeleton
Too long / too shortLength not constrainedSpecify exact word/bullet count
Hallucinated factsModel filling knowledge gapsAdd "Only use information in the provided text. If unsure, say so."
Wrong tone (too casual / too formal)Audience not specifiedAdd "Write for [audience]" and one style example
Multi-step reasoning wrongTask requires chain of thoughtSwitch to CoT prompting — add "Think step by step"
SECTION 06

Practical recipes

import anthropic client = anthropic.Anthropic() # Recipe 1: Structured extraction def extract_action_items(meeting_notes: str) -> str: return client.messages.create( model="claude-opus-4-6", max_tokens=512, messages=[{"role": "user", "content": f""" Extract action items from these meeting notes. Format: markdown checklist. Each item: "- [ ] [Owner]: [Action] by [Date if mentioned]" If no date mentioned, omit the date clause. Only include concrete tasks, not discussion points. Notes: {meeting_notes} """}] ).content[0].text # Recipe 2: Classification with confidence def classify_sentiment(text: str) -> str: return client.messages.create( model="claude-opus-4-6", max_tokens=50, messages=[{"role": "user", "content": f""" Classify the sentiment of this customer review. Reply with exactly one line: POSITIVE, NEGATIVE, or NEUTRAL Then a confidence score: HIGH, MEDIUM, or LOW Format: SENTIMENT | CONFIDENCE Review: {text} """}] ).content[0].text # Recipe 3: Constrained rewrite def make_concise(text: str, max_words: int = 50) -> str: return client.messages.create( model="claude-opus-4-6", max_tokens=200, messages=[{"role": "user", "content": f""" Rewrite the following in {max_words} words or fewer. Preserve all key facts. Use plain language. Active voice only. Text: {text} """}] ).content[0].text
The 30-second test: Before adding examples or switching to a different technique, try making your zero-shot prompt more specific. In most cases, a rewritten instruction outperforms a lazy prompt with examples attached.

Zero-shot chain-of-thought

Zero-shot chain-of-thought (CoT) prompting adds the phrase "Let's think step by step" to a question before requesting the answer, eliciting explicit reasoning traces from models without requiring manually crafted examples. This simple addition consistently improves accuracy on multi-step reasoning tasks — arithmetic, logical deduction, symbolic manipulation — by prompting the model to externalize intermediate reasoning rather than attempting to produce the final answer directly. The reasoning trace also provides interpretability: when the model makes an error, the trace reveals where the reasoning went wrong, enabling targeted prompt improvements.

Comparison: zero-shot vs few-shot vs fine-tuning

Zero-shot, few-shot, and fine-tuning represent different points on the cost-accuracy tradeoff curve for task adaptation. Zero-shot requires no examples and no training but achieves lower accuracy on specialized tasks with unusual output formats. Few-shot improves accuracy with 3–10 examples provided in context but consumes tokens and may not generalize reliably. Fine-tuning produces the highest accuracy on well-defined tasks but requires labeled datasets, compute, and ongoing maintenance. Starting with zero-shot, adding few-shot if accuracy is insufficient, and escalating to fine-tuning only when few-shot plateaus is the recommended progression.

ApproachData requiredLatency costAccuracy on novel tasks
Zero-shotNoneNoneMedium (instruction following)
Few-shot3–10 examplesExample tokens per requestMedium-high
Fine-tuning100s–1000s examplesNone at inferenceHigh (in-distribution)

Zero-shot generalization is constrained by the model's instruction following capability, which varies significantly across model families and sizes. Small models (under 7B parameters) with minimal instruction tuning often fail to interpret complex zero-shot task specifications correctly, defaulting to general text completion rather than following the specified task format. Larger models and models fine-tuned specifically on diverse instruction datasets handle zero-shot task specification more reliably. When zero-shot performance is poor on a task that larger models handle correctly, the bottleneck is typically instruction comprehension rather than task knowledge — the model understands the information but cannot follow the output format instructions.

Zero-shot evaluation on held-out task categories is the standard methodology for measuring a model's generalization capability beyond its training distribution. FLAN-style evaluation, where models are tested on tasks with instruction formats that were not seen during instruction tuning, measures the extent to which instruction following generalizes to novel task formulations. Models that perform well on zero-shot evaluation across diverse task categories with unseen instruction formats are better candidates for production deployment on new tasks, because they are less likely to require prompt engineering iterations when task specifications change.

Negative zero-shot instructions — explicitly telling the model what not to do rather than what to do — are often more effective than positive instructions for constraining output format and content. Specifying "Do not include explanations, preamble, or apologies — output only the requested JSON" produces cleaner structured outputs than positive specifications of the desired format alone. The combination of both positive (what to produce) and negative (what to exclude) instructions in zero-shot prompts provides the clearest behavioral specification and reduces format variance across responses.

Role prompting in zero-shot settings significantly affects response quality for domain-specific tasks. Instructing the model to "act as an experienced data scientist" or "respond as a senior software engineer reviewing this code" activates relevant knowledge domains and response styles associated with that role in the model's training data. Role prompts are most effective when the role is specific and the associated expertise is dense in the training distribution — general professional roles outperform highly specific or fictional roles because the training corpus contains more examples of how professionals in common fields communicate.

Output format specification is the most consistently impactful element of zero-shot prompts for structured generation tasks. Providing an exact template of the expected output format — including field names, data types, and example values — reduces format variance dramatically compared to descriptive format instructions. For JSON extraction tasks, including a sample JSON structure with placeholder values guides the model to produce correctly nested and typed output far more reliably than instructions like "return a JSON object with these fields." The more precisely the expected output format is specified, the less work downstream parsing code must do to handle format variations.