Core Techniques

Building with GenAI

Combining prompting, retrieval, agents, and fine-tuning into a working system. Learn how to choose the right tool for each problem.

5 Building Blocks
1 Decision Framework
On This Page
01 — Foundation

Building with GenAI: The 5 Core Blocks

Most tutorials teach each GenAI technique in isolation: here's how to prompt, here's how to build RAG, here's how to build agents. But in practice, you rarely use just one technique. Real applications combine multiple blocks depending on the problem constraints. This page teaches you the decision framework for choosing between them and how they fit together.

The Five Building Blocks

Prompting is the foundation. Feed an LLM a task and get an answer. No external data, no tools, no loops. It's fast, cheap, and deterministic — but it only works for tasks the model can solve from its training data.

Retrieval-Augmented Generation (RAG) solves the knowledge problem. When your task requires specific facts, documents, or domain data the model doesn't know, retrieve relevant context from a vector database or search index and inject it into the prompt. Now the model answers from facts, not from faulty memory.

Agents solve the action problem. When the task requires multiple steps, external data, or decisions based on observation, give the LLM tools and let it call them in a loop. It reasons, acts, observes, and repeats until the goal is met.

Fine-Tuning optimizes for task-specific performance. When prompting doesn't give enough accuracy, or when you need task-specific reasoning patterns or output formats, train the model on examples of correct behavior. This improves quality but costs time and data.

Data Engineering is the glue. Clean, structured, labeled data feeds every technique above. Without good data, prompting fails, RAG retrieves garbage, agents make wrong decisions, and fine-tuning learns from noise.

💡 Key principle: Start with prompting. Only add complexity when prompting fails. Each layer (RAG, agents, fine-tuning) adds latency, cost, and operational burden. Use them only when necessary.
02 — Strategy

The GenAI Building Decision Tree

Use this decision tree to choose the right combination of techniques for your problem. Answer each question in order and follow the path.

START: Can prompting alone solve this? YES? → Use prompting. Ship it. (If the LLM knows enough facts to answer correctly, no RAG, no agents needed. Fast, cheap, done.) NO? → Does the task need context injection? YES? → Build RAG (The LLM needs specific facts from your data. Retrieve documents, inject into prompt. LLM answers from facts, not hallucinations.) NO? → Does the task need actions/tools? YES? → Build Agents (The task requires tools, external APIs, or steps that depend on observation. Agent loops: reason → act → observe → repeat.) NO? → Does the task need task-specific accuracy? YES? → Fine-tune (Prompting gives 70% accuracy, not enough. Train on labeled examples of correct behavior.) NO? → Data Engineering (The problem is noisy, inconsistent data. Clean, normalize, label. Then try prompting again.)

Reading the Tree

Start at the top with "Can prompting alone solve this?" and follow branches down. Each decision eliminates options and narrows the solution. The tree encodes the principle: use the simplest approach that solves the problem. Never use agents when RAG works. Never fine-tune when prompting is sufficient.

Real-world example: Customer support. Can prompting answer FAQs? Probably. Add RAG if you need company-specific policies. Add agents if you need to look up customer accounts or create tickets. Fine-tune if accuracy matters more than speed. Data engineering if tickets are messy.
03 — Techniques

The Five Building Blocks Explained

Each block solves a specific class of problems. Know what each is for, when it works, and when it fails.

1. Prompting

What: Feed a task to an LLM and get an answer. When to use: Task knowledge is in the model's training data, and answer depends only on reasoning or common knowledge. Examples: Summarizing text, answering trivia, drafting emails, classifying sentiment. Latency: One LLM call, ~100ms-1s. Cost: Low. Failure mode: Model doesn't know the facts, hallucinates.

2. Retrieval-Augmented Generation (RAG)

What: Retrieve relevant documents from a database, inject them into the prompt, then let the LLM answer from that context. When to use: Task requires specific facts, policies, or domain data that aren't in the model's training data. Examples: Customer support, legal document Q&A, internal knowledge search, product documentation. Latency: Retrieval (~50-200ms) + LLM call (~100ms-1s). Cost: Moderate (vector DB lookups + LLM). Failure mode: Retriever doesn't find relevant documents, or documents don't answer the question.

3. Agents

What: Give the LLM tools and let it decide when to use them. The agent loops: reason about task, call a tool, observe the result, repeat until done. When to use: Task requires multiple steps, external data that changes dynamically, or decisions based on intermediate results. Examples: Multi-step research, debugging code, booking travel, exploring a database. Latency: Multiple LLM calls (3-10x slower than single call). Cost: High (per-step LLM calls). Failure mode: Agent picks wrong tools, loops infinitely, or forgets context.

4. Fine-Tuning

What: Train an LLM on examples of correct behavior for your specific task, then use the tuned model for inference. When to use: Prompting gives 60-80% accuracy and you need 90%+. Task has specific output format, tone, or reasoning pattern. Examples: Specialized classification, domain-specific content generation, output parsing. Latency: Same as prompting (~100ms-1s). Cost: High upfront (training), then low per-call. Failure mode: Not enough training data, model overfits to examples, training teaches wrong patterns.

5. Data Engineering

What: Clean, validate, label, and structure your data so that prompting, RAG, and fine-tuning work better. When to use: Raw data is messy, incomplete, or inconsistent. You need reliable ground truth for evaluation or training. Examples: Cleaning customer records for fine-tuning, labeling examples for evaluation, extracting structured fields from unstructured text. Latency: Offline work, not part of inference. Cost: High (manual labeling or automation). Failure mode: Labels are wrong or biased, affecting all downstream tasks.

⚠️ The hidden cost: Data engineering often takes 60% of project time but is invisible until it's missing. Allocate time for data cleaning and labeling early, not as an afterthought.
04 — Mindset

Build vs Buy: When to Roll Your Own

Every GenAI technique can be built from scratch or bought as a managed service. Know when each makes sense.

Build Your Own

Prompting: Always build. Prompting is free and requires no infrastructure. RAG: Build if you have full control of documents and queries. Buy if you need commercial-grade retrieval, enterprise scaling, or managed updates. Agents: Build simple agents in-house using frameworks like LangGraph. Buy if you need multi-agent orchestration, human-in-the-loop, or production monitoring. Fine-tuning: Build if you have compute budget and 100+ labeled examples. Buy managed tuning (OpenAI, Anthropic) if you want hands-off training. Data engineering: Always build this in-house — it's domain-specific and iterative.

Buy/Use a Service

Managed services (LLamaIndex, Pinecone, LangChain+Platform, Anthropic Workbench) reduce operational overhead. They handle infrastructure, scaling, monitoring, and updates. Cost per request is higher, but total cost is often lower because you don't pay for engineering time and ops burden.

💡 Decision rule: Build if it's core to your competitive advantage. Buy if it's a commodity. Most companies should buy RAG and use their own prompting/evaluation.
ApproachBest ForProsCons
Managed API (OpenAI, Anthropic)Prototyping, variable loadZero infra, latest models, fast to shipPer-token cost at scale, data leaves your boundary
Open model + cloud GPUCost-sensitive, medium volumePredictable cost, data stays in VPCOps overhead, GPU reservation needed
Fine-tuned modelDomain-specific tasksBest task performance, smaller model possibleTraining cost, eval overhead, drift risk
Framework (LangChain)Complex pipelines, RAG, agentsBatteries included, large communityAbstraction leaks, harder to debug
Custom pipelineHigh performance, unusual needsFull control, minimal overheadHighest build cost, must build retries, observability
05 — Tools

Dev Frameworks and Tools

Frameworks wrap the five blocks into higher-level abstractions. They handle common patterns, reduce boilerplate, and add features like memory, monitoring, and multi-step pipelines.

Multi-Purpose Frameworks

1

LangChain

Swiss Army knife for LLM applications. Components for prompts, chains, memory, agents, RAG. Best for rapid prototyping and building with many models.

2

LangGraph

State machines for agents. Explicit control flow, human-in-the-loop, streaming. Better for production agents than basic LangChain.

3

LlamaIndex

Purpose-built for RAG. Data loaders, indexing, retrieval, evaluation. Stronger than LangChain for production RAG systems.

4

DSPy

Composable LLM pipelines with automatic optimization. Define what you want, DSPy optimizes prompts and fine-tunes models.

Framework selection: Start with prompting (no framework). Add LangChain when you need memory or chains. Graduate to LangGraph for agents, LlamaIndex for RAG. Don't over-engineer.
06 — Practice

Integration: A Complete Example

Here's how a real application combines multiple blocks. This is a code documentation Q&A system: user asks questions about code, system retrieves relevant files (RAG), then uses an agent to search docs and run code analysis, finally answers with citations.

import anthropic from vector_db import retrieve_files from subprocess import run # Block 1: Set up prompting with context system_prompt = """You are a code documentation expert. Answer questions about our codebase accurately. Always cite the file and line number.""" # Block 2: Set up RAG def get_context(query: str) -> list[str]: # Retrieve relevant files from vector DB return retrieve_files(query, top_k=5) # Block 3: Set up agent tools tools = [ { 'name': 'search_docs', 'description': 'Search documentation', 'input_schema': {'query': {'type': 'string'}} }, { 'name': 'run_test', 'description': 'Run unit tests', 'input_schema': {'pattern': {'type': 'string'}} } ] # Block 4/5: Execute integrated pipeline def answer_question(user_question: str) -> str: # Step 1: Retrieve context (RAG) documents = get_context(user_question) context_str = '\n'.join(documents) # Step 2: Set up agent with tools client = anthropic.Anthropic() messages = [ {'role': 'user', 'content': f'{user_question}\n\nRelevant files:\n{context_str}'} ] # Step 3: Agent loop while True: resp = client.messages.create( model='claude-opus-4-5', max_tokens=1024, system=system_prompt, tools=tools, messages=messages ) if resp.stop_reason == 'end_turn': # Prompting succeeded return next( (b.text for b in resp.content if hasattr(b, 'text')), 'No answer' ) # Process tool calls (agent reasoning) tool_results = [] for block in resp.content: if block.type == 'tool_use': if block.name == 'search_docs': result = get_context(block.input['query']) elif block.name == 'run_test': result = run(['pytest', '-k', block.input['pattern']]).stdout tool_results.append({ 'type': 'tool_result', 'tool_use_id': block.id, 'content': str(result) }) # Feed back to agent messages.append({'role': 'assistant', 'content': resp.content}) messages.append({'role': 'user', 'content': tool_results}) # This combines all five blocks: # - Prompting: system prompt guides the LLM # - RAG: retrieve_files injects context # - Agents: agent loop with tools # - Data engineering: documents must be clean, indexed # - Fine-tuning: (optional) could train on Q&A pairs

How This Works

User asks: "How do I authenticate with the payment API?" RAG kicks in: Retrieves payment module files. Agent reasons: "I need more details about authentication patterns." Agent acts: Calls search_docs for "OAuth". Prompting concludes: LLM synthesizes answer from context and tool results. User gets: Answer with file citations.

💡 Integration principle: Each block solves one problem. RAG solves "where is the data?", agents solve "what steps?", prompting solves "how to reason?". Combine them in order of simplicity, adding complexity only when needed.
Python · Complete RAG pipeline using LlamaIndex
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

# Global settings
Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0.0)
Settings.chunk_size = 512
Settings.chunk_overlap = 64

# 1. Load documents
documents = SimpleDirectoryReader("./docs").load_data()

# 2. Parse into chunks
parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)
print(f"Parsed {len(documents)} docs into {len(nodes)} chunks")

# 3. Build vector index (embeddings computed here)
index = VectorStoreIndex(nodes, show_progress=True)

# 4. Query
engine = index.as_query_engine(similarity_top_k=5)
response = engine.query(
    "What are the key trade-offs between RAG and fine-tuning?"
)
print(response)

# 5. Persist index for reuse (avoids re-embedding on restart)
index.storage_context.persist("./index_store")

# Reload later:
# from llama_index.core import StorageContext, load_index_from_storage
# storage_ctx = StorageContext.from_defaults(persist_dir="./index_store")
# index = load_index_from_storage(storage_ctx)
07 — Explore

Deep Dives: Each Building Block

Each technique deserves its own deep dive. Start with whichever block you're weakest on, then explore them in order of your application's complexity.

Core Building Blocks

1

Prompting

Prompt engineering, few-shot learning, chain-of-thought reasoning, and how to get the best results from an LLM without external data.

2

RAG

Retrieval-augmented generation: chunking, embedding, vector search, ranking, and context injection for fact-based answers.

3

Agents

Agent architecture: tool calling, ReAct loops, planning, memory, and frameworks for multi-step autonomous behavior.

4

Fine-Tuning

Model adaptation: training procedures, data labeling, hyperparameter tuning, and when fine-tuning beats prompting.

5

Data Engineering

Data infrastructure: cleaning, labeling, versioning, and pipelines that make every other block work reliably.

Learning path: Prompting → RAG → Data Engineering → Agents → Fine-Tuning. Master prompting first. Data engineering fixes most problems. Agents and fine-tuning are for advanced use cases.

Production Checklist

Before shipping a GenAI application, ensure you've covered these across all five blocks.

⚠️ Prompting: Version control your prompts. Test against edge cases. Monitor for model drift. RAG: Measure retrieval quality. Update docs when they change. Monitor retrieval latency. Agents: Set loop limits. Log every tool call. Monitor for infinite loops. Fine-tuning: Reserve 20% of data for evaluation. Monitor for overfitting. Retrain as data changes. Data: Establish labeling standards. Version datasets. Track data lineage.
08 — Further Reading

References

Core Concepts
Frameworks
Learning Resources
  • Blog Gurnee, W. & Marks, S. Language Models Represent Beliefs: Optimizing for Alignment Improves Capabilities.arXiv:2406.06493 ↗