SECTION 01
What is LangChain
LangChain is an open-source Python framework for building applications with large language models. Founded in 2022 by Harrison Chase, it has become the de facto standard for LLM orchestration, with over 90k GitHub stars and widespread adoption in startups and enterprises.
Core Mission: "Take LLMs from proof-of-concept to production." LangChain abstracts away boilerplate, handles API management, and provides reusable components for common patterns (RAG, agents, memory, evaluation).
Key Components
- Chains: Sequences of LLM calls and actions. A simple "question β LLM β answer" is a chain.
- Agents: LLMs that can decide what tools to use and iterate. "Should I use a calculator, search, or respond directly?"
- Retrievers: Objects that fetch documents from knowledge bases. Abstract over vector stores, databases, APIs.
- Tools: Functions the agent can call (Google Search, Calculator, database query, etc.)
- Memory: Maintains conversation history and context across turns.
- Evaluators: Assess LLM outputs against criteria (correctness, safety, relevance).
Why Use LangChain?
- Rapid prototyping: Build a RAG app in 50 lines vs 500 without a framework
- Abstraction over models: Switch from OpenAI to Anthropic to Ollama with one line change
- Extensibility: Custom chains, tools, and retrievers inherit framework capabilities
- Production features: Caching, batching, async, streaming, error handling
- Integrations: 100+ integrations (vector stores, SQL databases, APIs, chat platforms)
Philosophy: LangChain is "glue" code. It standardizes interfaces so you can compose models, tools, and data sources. The magic is in simplification and reusability.
SECTION 02
Core Abstractions
LangChain provides unified interfaces for common components:
1. LLMs & ChatModels
Two model types with a consistent interface:
- LLM: Text-in, text-out (e.g., "Complete this: Hello...")
- ChatModel: Message-based (e.g., system prompts, user/assistant turns)
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
# All inherit same interface
model = ChatOpenAI(model="gpt-4")
# or
model = ChatAnthropic(model="claude-3-5-sonnet-20241022")
# Same code works for both
response = model.invoke([
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "What is 2+2?"}
])
print(response.content) # "4"
2. PromptTemplates
Parameterized prompts. Define once, reuse with different variables:
from langchain.prompts import ChatPromptTemplate
template = ChatPromptTemplate.from_messages([
("system", "You are a {profession}."),
("user", "Answer this question: {question}")
])
# Reuse
prompt_scientist = template.format_messages(
profession="scientist",
question="What is photosynthesis?"
)
# β 2 messages: system=scientist, user=photosynthesis question
prompt_lawyer = template.format_messages(
profession="lawyer",
question="What is a contract?"
)
# β 2 messages: system=lawyer, user=contract question
3. OutputParsers
Parse LLM output (JSON, lists, structured data):
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel
class SentimentResponse(BaseModel):
sentiment: str # "positive", "negative", "neutral"
confidence: float # 0-1
explanation: str
parser = PydanticOutputParser(pydantic_object=SentimentResponse)
# Use in prompt
prompt = ChatPromptTemplate.from_template(
"Rate sentiment of: {text}\n{format_instructions}"
).partial(format_instructions=parser.get_format_instructions())
# Parse output automatically
response = model.invoke(prompt.format(text="I love this!"))
parsed = parser.parse(response.content)
# β SentimentResponse(sentiment="positive", confidence=0.95, ...)
4. Retrievers
Abstract interface for document retrieval. Swap vector stores without changing code:
- VectorStoreRetriever (Pinecone, Weaviate, FAISS, Chroma)
- ToolkitRetriever (search APIs, databases)
- ParentDocumentRetriever (retrieve large docs, return excerpts)
5. Tools
Functions agents can use. Defined with type hints and descriptions:
from langchain.tools import tool
@tool
def calculate_area(radius: float) -> float:
"""Calculate area of a circle given radius."""
import math
return math.pi * radius ** 2
# Tool has name, description, input schema from signature
# Agents discover this metadata and decide when to call
agent.invoke(
"What's the area of a circle with radius 5?"
)
# Agent: "I should use calculate_area tool"
# β 78.54...
Abstraction Win: All components follow the Runnable interface. They all have `.invoke()`, `.stream()`, `.batch()`. This consistency is powerfulβcompose chains without learning different APIs.
SECTION 03
LangChain Expression Language (LCEL)
LCEL (LangChain Expression Language) is a modern, elegant way to compose chains using the pipe operator (`|`).
Basic Syntax
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
# Define components
prompt = ChatPromptTemplate.from_template("Translate to {language}: {text}")
model = ChatOpenAI()
output_parser = StrOutputParser()
# Compose with pipe operator
chain = prompt | model | output_parser
# Run
result = chain.invoke({
"language": "Spanish",
"text": "Hello, world!"
})
# β "Β‘Hola, mundo!"
# Equivalent to:
# result = output_parser.parse(model.invoke(prompt.format(...)))
Key Features
- Readability: Pipes show data flow left-to-right
- Composability: Chains are Runnables; can be piped further
- Parallel execution: Use `RunnableParallel` for concurrent branches
- Conditional routing: Use `RunnableIf` to branch based on conditions
- Streaming: All chains support streaming by default
Advanced: Parallel Execution
from langchain.schema.runnable import RunnableParallel
# Run multiple branches in parallel
parallel_chain = RunnableParallel(
sentiment=sentiment_analyzer,
entities=entity_extractor,
summary=summarizer
)
result = parallel_chain.invoke(text)
# β {"sentiment": "...", "entities": [...], "summary": "..."}
# All run in parallel!
Streaming
All LCEL chains support streaming by default, enabling real-time response:
chain = prompt | model | output_parser
# Stream tokens as they arrive
for chunk in chain.stream({"query": "Explain quantum computing"}):
print(chunk, end="", flush=True) # Print progressively
LCEL Advantage: Much more readable than nested function calls. `A | B | C` vs `C(B(A(...)))`. Plus, built-in support for streaming, batching, async.
SECTION 04
RAG with LangChain
Building a production RAG (Retrieval-Augmented Generation) system is straightforward with LangChain:
# RAG Pipeline with LangChain
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnablePassthrough
# Step 1: Load documents
loader = PyPDFLoader("quantum.pdf")
docs = loader.load()
# Step 2: Split into chunks
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
chunks = splitter.split_documents(docs)
# Step 3: Create embeddings & vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
# Step 4: Create RAG chain
prompt = ChatPromptTemplate.from_template(
"Based on these docs: {context}\n\nAnswer: {question}"
)
model = ChatOpenAI()
# Compose RAG chain
rag_chain = (
{
"context": retriever, # Retriever is a Runnable
"question": RunnablePassthrough() # Pass through the question
}
| prompt
| model
)
# Step 5: Run
answer = rag_chain.invoke("What is quantum entanglement?")
print(answer.content)
Explanation
- RunnablePassthrough(): Passes input unchanged. Allows multiple inputs in the chain.
- RunnableParallel: The `{...}` syntax creates a parallel Runnable. Both `context` and `question` are computed.
- Retriever as Runnable: Retrievers inherit Runnable interface. Can be piped like any other component.
Adding Chat History (Memory)
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationRetrievalChain
memory = ConversationBufferMemory(
memory_key="chat_history",
return_messages=True
)
# ConversationRetrievalChain handles memory automatically
chain = ConversationRetrievalChain.from_llm(
llm=model,
retriever=retriever,
memory=memory
)
# First turn
response1 = chain.invoke({"question": "What is quantum?"})
# Second turn (remembers "quantum" context)
response2 = chain.invoke({"question": "And how does it relate to computing?"})
# Memory enables multi-turn coherence
RAG Best Practice: Always chunk documents and include overlap (e.g., 1000 chars, 200 overlap). This prevents semantically important boundaries from being split mid-concept.
SECTION 05
Agents with LangChain
Agents combine LLMs with tools, enabling dynamic reasoning and multi-step problem solving:
from langchain.agents import create_react_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain.tools import Tool
import requests
# Define tools
@tool
def search_wikipedia(query: str) -> str:
"""Search Wikipedia for information."""
# Implement Wikipedia API call
pass
@tool
def get_weather(location: str) -> str:
"""Get current weather for a location."""
# Implement weather API call
pass
tools = [search_wikipedia, get_weather]
# Create ReAct agent (Reasoning + Acting)
model = ChatOpenAI(model="gpt-4")
agent = create_react_agent(model, tools, prompt_template)
# Execute agent
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
result = executor.invoke({
"input": "What's the weather like in Paris, and what's famous there?"
})
# Agent decides: "I need weather + Wikipedia search"
# β Calls both tools iteratively
# β Synthesizes answer
Agent Loop
- 1. Thought: LLM reasons about the task
- 2. Action: LLM decides which tool to call
- 3. Observation: Tool result is returned
- 4. Repeat: If more tools needed, loop. Otherwise, return final answer.
Structured Output Agents
For deterministic parsing, use structured agents that output JSON:
from langchain.agents import create_structured_chat_agent
# Agent outputs JSON actions, not text
agent = create_structured_chat_agent(
llm=model,
tools=tools,
prompt=prompt,
output_parser=JsonOutputParser()
)
# Guaranteed valid JSON, easier to parse
Agent Tip: Agents are powerful but can hallucinate tool calls or get stuck in loops. Use verbose=True to debug. Limit max_iterations to prevent runaway loops. Provide clear tool descriptions.
SECTION 06
LangSmith Integration
LangSmith is LangChain's production platform for tracing, evaluating, and monitoring LLM applications.
Features
- Tracing: Log all LLM calls, tools, and latencies. See exactly what happened in each request.
- Evaluation: Run tests against your chains. Check if outputs meet quality criteria.
- Prompt Management: Version prompts. A/B test prompt variations.
- Monitoring: Track performance in production. Alert on errors, latency spikes.
Setup
import os
from langchain.callbacks import LangChainTracer
# Set API key
os.environ["LANGCHAIN_API_KEY"] = "your-api-key"
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "my-rag-app"
# All LangChain calls are automatically traced
# View in https://smith.langchain.com/
chain.invoke({"query": "..."})
# Trace appears in LangSmith dashboard
Evaluation in LangSmith
from langsmith import evaluate
# Define evaluators
def check_answer_length(output: dict) -> bool:
"""Evaluator: answer should be < 200 words."""
return len(output["answer"].split()) < 200
def check_factuality(output: dict) -> bool:
"""Evaluator: use LLM judge for factuality."""
judge = ChatOpenAI(model="gpt-4")
eval_prompt = f"""Is this answer factually correct?
Output: {output['answer']}
Return: yes/no"""
result = judge.invoke(eval_prompt)
return "yes" in result.lower()
# Run evaluation
results = evaluate(
lambda inputs: rag_chain.invoke(inputs),
data=[
{"question": "What is photosynthesis?", "expected": "..."},
{"question": "How does DNA work?", "expected": "..."}
],
evaluators=[check_answer_length, check_factuality],
experiment_prefix="rag-v1"
)
# Results in LangSmith, compare to previous versions
LangSmith Dashboard Insights
- Trace view: Click into a request to see all LLM calls, latencies, token counts
- Dataset management: Version test datasets, track evaluation results over time
- Comparison: A/B test new prompts or models against baseline
- Production monitoring: Real-time stats on latency, error rates, token usage
LangSmith Cost: Free tier includes 100 traces/month. Paid plans start at $100/month. Worth it for production monitoring but unnecessary for hobby projects.
SECTION 07
LangChain vs Alternatives
LangChain dominates but isn't alone. Comparison with key alternatives:
| Framework |
Best For |
Strengths |
Weaknesses |
| LangChain |
General-purpose LLM apps, RAG, agents |
Massive ecosystem, LCEL elegance, production-ready (LangSmith) |
Can be verbose; large codebase; frequent API changes |
| LlamaIndex |
RAG, document indexing, data connectors |
Best-in-class RAG. 100+ data connectors. Auto-indexing strategies |
Less focused on agents. Smaller ecosystem |
| LangGraph |
Complex workflows, agents with memory |
Explicit control flow. Graph-based reasoning. Built by LangChain team |
Newer, fewer examples. Learning curve for graph thinking |
| Pydantic AI |
Type-safe agents, structured outputs |
Strong typing. Structured validation. Clean API |
Newer, less mature. Smaller community |
| Raw SDK (openai, anthropic) |
Simple scripts, full control |
Lightweight, no dependencies. Direct model access |
Manual chain management, boilerplate, no abstractions |
When to Use Each
Use LangChain if:
β Building a production RAG or agent system
β Need integrations with many vector stores, databases
β Want LangSmith monitoring
β Team is already familiar with it (most common in industry)
Use LlamaIndex if:
β Focus is RAG and document indexing
β Need auto-indexing strategies (hierarchical, hybrid, etc.)
β Working with semi-structured data (Notion, Google Docs)
Use LangGraph if:
β Complex multi-turn workflows with explicit control
β Need shared state across agent loops
β Building stateful applications with memory
Use raw SDK if:
β Simple script or MVP
β Full control is priority over convenience
β Want zero dependencies
LangChain + LlamaIndex Hybrid
In practice, many teams use both: LlamaIndex for RAG indexing, LangChain for agents and orchestration:
# LlamaIndex for indexing
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
llama_retriever = index.as_retriever()
# Wrap in LangChain Retriever interface
from langchain.tools import Tool
retriever_tool = Tool(
name="document_search",
func=lambda q: "\n".join([d.text for d in llama_retriever.retrieve(q)]),
description="Search documents"
)
# Use in LangChain agent
agent = create_react_agent(model, [retriever_tool, ...])
Recommendation: Start with LangChain for general work. If RAG becomes complex, add LlamaIndex for indexing. If agents get complex, migrate to LangGraph. Most production systems use LangChain + LangSmith as the core stack.