LangChain

What is LangChain Core Abstractions LCEL Expression Language RAG with LangChain Agents with LangChain LangSmith Integration LangChain vs Alternatives

SECTION 01

What is LangChain

LangChain is an open-source Python framework for building applications with large language models. Founded in 2022 by Harrison Chase, it has become the de facto standard for LLM orchestration, with over 90k GitHub stars and widespread adoption in startups and enterprises.

Core Mission: "Take LLMs from proof-of-concept to production." LangChain abstracts away boilerplate, handles API management, and provides reusable components for common patterns (RAG, agents, memory, evaluation).

Key Components

Chains: Sequences of LLM calls and actions. A simple "question → LLM → answer" is a chain.
Agents: LLMs that can decide what tools to use and iterate. "Should I use a calculator, search, or respond directly?"
Retrievers: Objects that fetch documents from knowledge bases. Abstract over vector stores, databases, APIs.
Tools: Functions the agent can call (Google Search, Calculator, database query, etc.)
Memory: Maintains conversation history and context across turns.
Evaluators: Assess LLM outputs against criteria (correctness, safety, relevance).

Why Use LangChain?

Rapid prototyping: Build a RAG app in 50 lines vs 500 without a framework
Abstraction over models: Switch from OpenAI to Anthropic to Ollama with one line change
Extensibility: Custom chains, tools, and retrievers inherit framework capabilities
Production features: Caching, batching, async, streaming, error handling
Integrations: 100+ integrations (vector stores, SQL databases, APIs, chat platforms)

Philosophy: LangChain is "glue" code. It standardizes interfaces so you can compose models, tools, and data sources. The magic is in simplification and reusability.

SECTION 02

Core Abstractions

LangChain provides unified interfaces for common components:

1. LLMs & ChatModels

Two model types with a consistent interface:

LLM: Text-in, text-out (e.g., "Complete this: Hello...")
ChatModel: Message-based (e.g., system prompts, user/assistant turns)

from langchain_openai import ChatOpenAI from langchain_anthropic import ChatAnthropic # All inherit same interface model = ChatOpenAI(model="gpt-4") # or model = ChatAnthropic(model="claude-3-5-sonnet-20241022") # Same code works for both response = model.invoke([ {"role": "system", "content": "You are a helpful assistant"}, {"role": "user", "content": "What is 2+2?"} ]) print(response.content) # "4"

2. PromptTemplates

Parameterized prompts. Define once, reuse with different variables:

from langchain.prompts import ChatPromptTemplate template = ChatPromptTemplate.from_messages([ ("system", "You are a {profession}."), ("user", "Answer this question: {question}") ]) # Reuse prompt_scientist = template.format_messages( profession="scientist", question="What is photosynthesis?" ) # → 2 messages: system=scientist, user=photosynthesis question prompt_lawyer = template.format_messages( profession="lawyer", question="What is a contract?" ) # → 2 messages: system=lawyer, user=contract question

3. OutputParsers

Parse LLM output (JSON, lists, structured data):

from langchain.output_parsers import PydanticOutputParser from pydantic import BaseModel class SentimentResponse(BaseModel): sentiment: str # "positive", "negative", "neutral" confidence: float # 0-1 explanation: str parser = PydanticOutputParser(pydantic_object=SentimentResponse) # Use in prompt prompt = ChatPromptTemplate.from_template( "Rate sentiment of: {text}\n{format_instructions}" ).partial(format_instructions=parser.get_format_instructions()) # Parse output automatically response = model.invoke(prompt.format(text="I love this!")) parsed = parser.parse(response.content) # → SentimentResponse(sentiment="positive", confidence=0.95, ...)

4. Retrievers

Abstract interface for document retrieval. Swap vector stores without changing code:

VectorStoreRetriever (Pinecone, Weaviate, FAISS, Chroma)
ToolkitRetriever (search APIs, databases)
ParentDocumentRetriever (retrieve large docs, return excerpts)

5. Tools

Functions agents can use. Defined with type hints and descriptions:

from langchain.tools import tool @tool def calculate_area(radius: float) -> float: """Calculate area of a circle given radius.""" import math return math.pi * radius ** 2 # Tool has name, description, input schema from signature # Agents discover this metadata and decide when to call agent.invoke( "What's the area of a circle with radius 5?" ) # Agent: "I should use calculate_area tool" # → 78.54...

Abstraction Win: All components follow the Runnable interface. They all have `.invoke()`, `.stream()`, `.batch()`. This consistency is powerful—compose chains without learning different APIs.

SECTION 03

LangChain Expression Language (LCEL)

LCEL (LangChain Expression Language) is a modern, elegant way to compose chains using the pipe operator (`|`).

Basic Syntax

from langchain_openai import ChatOpenAI from langchain.prompts import ChatPromptTemplate # Define components prompt = ChatPromptTemplate.from_template("Translate to {language}: {text}") model = ChatOpenAI() output_parser = StrOutputParser() # Compose with pipe operator chain = prompt | model | output_parser # Run result = chain.invoke({ "language": "Spanish", "text": "Hello, world!" }) # → "¡Hola, mundo!" # Equivalent to: # result = output_parser.parse(model.invoke(prompt.format(...)))

Key Features

Readability: Pipes show data flow left-to-right
Composability: Chains are Runnables; can be piped further
Parallel execution: Use `RunnableParallel` for concurrent branches
Conditional routing: Use `RunnableIf` to branch based on conditions
Streaming: All chains support streaming by default

Advanced: Parallel Execution

from langchain.schema.runnable import RunnableParallel # Run multiple branches in parallel parallel_chain = RunnableParallel( sentiment=sentiment_analyzer, entities=entity_extractor, summary=summarizer ) result = parallel_chain.invoke(text) # → {"sentiment": "...", "entities": [...], "summary": "..."} # All run in parallel!

Streaming

All LCEL chains support streaming by default, enabling real-time response:

chain = prompt | model | output_parser # Stream tokens as they arrive for chunk in chain.stream({"query": "Explain quantum computing"}): print(chunk, end="", flush=True) # Print progressively

LCEL Advantage: Much more readable than nested function calls. `A | B | C` vs `C(B(A(...)))`. Plus, built-in support for streaming, batching, async.

SECTION 04

RAG with LangChain

Building a production RAG (Retrieval-Augmented Generation) system is straightforward with LangChain:

# RAG Pipeline with LangChain from langchain_community.document_loaders import PyPDFLoader from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain_openai import OpenAIEmbeddings from langchain_community.vectorstores import Chroma from langchain_openai import ChatOpenAI from langchain.prompts import ChatPromptTemplate from langchain.schema.runnable import RunnablePassthrough # Step 1: Load documents loader = PyPDFLoader("quantum.pdf") docs = loader.load() # Step 2: Split into chunks splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200 ) chunks = splitter.split_documents(docs) # Step 3: Create embeddings & vector store embeddings = OpenAIEmbeddings() vectorstore = Chroma.from_documents(chunks, embeddings) retriever = vectorstore.as_retriever(search_kwargs={"k": 5}) # Step 4: Create RAG chain prompt = ChatPromptTemplate.from_template( "Based on these docs: {context}\n\nAnswer: {question}" ) model = ChatOpenAI() # Compose RAG chain rag_chain = ( { "context": retriever, # Retriever is a Runnable "question": RunnablePassthrough() # Pass through the question } | prompt | model ) # Step 5: Run answer = rag_chain.invoke("What is quantum entanglement?") print(answer.content)

Explanation

RunnablePassthrough(): Passes input unchanged. Allows multiple inputs in the chain.
RunnableParallel: The `{...}` syntax creates a parallel Runnable. Both `context` and `question` are computed.
Retriever as Runnable: Retrievers inherit Runnable interface. Can be piped like any other component.

Adding Chat History (Memory)

from langchain.memory import ConversationBufferMemory from langchain.chains import ConversationRetrievalChain memory = ConversationBufferMemory( memory_key="chat_history", return_messages=True ) # ConversationRetrievalChain handles memory automatically chain = ConversationRetrievalChain.from_llm( llm=model, retriever=retriever, memory=memory ) # First turn response1 = chain.invoke({"question": "What is quantum?"}) # Second turn (remembers "quantum" context) response2 = chain.invoke({"question": "And how does it relate to computing?"}) # Memory enables multi-turn coherence

RAG Best Practice: Always chunk documents and include overlap (e.g., 1000 chars, 200 overlap). This prevents semantically important boundaries from being split mid-concept.

SECTION 05

Agents with LangChain

Agents combine LLMs with tools, enabling dynamic reasoning and multi-step problem solving:

from langchain.agents import create_react_agent, AgentExecutor from langchain_openai import ChatOpenAI from langchain.tools import Tool import requests # Define tools @tool def search_wikipedia(query: str) -> str: """Search Wikipedia for information.""" # Implement Wikipedia API call pass @tool def get_weather(location: str) -> str: """Get current weather for a location.""" # Implement weather API call pass tools = [search_wikipedia, get_weather] # Create ReAct agent (Reasoning + Acting) model = ChatOpenAI(model="gpt-4") agent = create_react_agent(model, tools, prompt_template) # Execute agent executor = AgentExecutor(agent=agent, tools=tools, verbose=True) result = executor.invoke({ "input": "What's the weather like in Paris, and what's famous there?" }) # Agent decides: "I need weather + Wikipedia search" # → Calls both tools iteratively # → Synthesizes answer

Agent Loop

1. Thought: LLM reasons about the task
2. Action: LLM decides which tool to call
3. Observation: Tool result is returned
4. Repeat: If more tools needed, loop. Otherwise, return final answer.

Structured Output Agents

For deterministic parsing, use structured agents that output JSON:

from langchain.agents import create_structured_chat_agent # Agent outputs JSON actions, not text agent = create_structured_chat_agent( llm=model, tools=tools, prompt=prompt, output_parser=JsonOutputParser() ) # Guaranteed valid JSON, easier to parse

Agent Tip: Agents are powerful but can hallucinate tool calls or get stuck in loops. Use verbose=True to debug. Limit max_iterations to prevent runaway loops. Provide clear tool descriptions.

SECTION 06

LangSmith Integration

LangSmith is LangChain's production platform for tracing, evaluating, and monitoring LLM applications.

Features

Tracing: Log all LLM calls, tools, and latencies. See exactly what happened in each request.
Evaluation: Run tests against your chains. Check if outputs meet quality criteria.
Prompt Management: Version prompts. A/B test prompt variations.
Monitoring: Track performance in production. Alert on errors, latency spikes.

Setup

import os from langchain.callbacks import LangChainTracer # Set API key os.environ["LANGCHAIN_API_KEY"] = "your-api-key" os.environ["LANGCHAIN_TRACING_V2"] = "true" os.environ["LANGCHAIN_PROJECT"] = "my-rag-app" # All LangChain calls are automatically traced # View in https://smith.langchain.com/ chain.invoke({"query": "..."}) # Trace appears in LangSmith dashboard

Evaluation in LangSmith

from langsmith import evaluate # Define evaluators def check_answer_length(output: dict) -> bool: """Evaluator: answer should be < 200 words.""" return len(output["answer"].split()) < 200 def check_factuality(output: dict) -> bool: """Evaluator: use LLM judge for factuality.""" judge = ChatOpenAI(model="gpt-4") eval_prompt = f"""Is this answer factually correct? Output: {output['answer']} Return: yes/no""" result = judge.invoke(eval_prompt) return "yes" in result.lower() # Run evaluation results = evaluate( lambda inputs: rag_chain.invoke(inputs), data=[ {"question": "What is photosynthesis?", "expected": "..."}, {"question": "How does DNA work?", "expected": "..."} ], evaluators=[check_answer_length, check_factuality], experiment_prefix="rag-v1" ) # Results in LangSmith, compare to previous versions

LangSmith Dashboard Insights

Trace view: Click into a request to see all LLM calls, latencies, token counts
Dataset management: Version test datasets, track evaluation results over time
Comparison: A/B test new prompts or models against baseline
Production monitoring: Real-time stats on latency, error rates, token usage

LangSmith Cost: Free tier includes 100 traces/month. Paid plans start at $100/month. Worth it for production monitoring but unnecessary for hobby projects.

SECTION 07

LangChain vs Alternatives

LangChain dominates but isn't alone. Comparison with key alternatives:

Framework	Best For	Strengths	Weaknesses
LangChain	General-purpose LLM apps, RAG, agents	Massive ecosystem, LCEL elegance, production-ready (LangSmith)	Can be verbose; large codebase; frequent API changes
LlamaIndex	RAG, document indexing, data connectors	Best-in-class RAG. 100+ data connectors. Auto-indexing strategies	Less focused on agents. Smaller ecosystem
LangGraph	Complex workflows, agents with memory	Explicit control flow. Graph-based reasoning. Built by LangChain team	Newer, fewer examples. Learning curve for graph thinking
Pydantic AI	Type-safe agents, structured outputs	Strong typing. Structured validation. Clean API	Newer, less mature. Smaller community
Raw SDK (openai, anthropic)	Simple scripts, full control	Lightweight, no dependencies. Direct model access	Manual chain management, boilerplate, no abstractions

When to Use Each

Use LangChain if: ✓ Building a production RAG or agent system ✓ Need integrations with many vector stores, databases ✓ Want LangSmith monitoring ✓ Team is already familiar with it (most common in industry) Use LlamaIndex if: ✓ Focus is RAG and document indexing ✓ Need auto-indexing strategies (hierarchical, hybrid, etc.) ✓ Working with semi-structured data (Notion, Google Docs) Use LangGraph if: ✓ Complex multi-turn workflows with explicit control ✓ Need shared state across agent loops ✓ Building stateful applications with memory Use raw SDK if: ✓ Simple script or MVP ✓ Full control is priority over convenience ✓ Want zero dependencies

LangChain + LlamaIndex Hybrid

In practice, many teams use both: LlamaIndex for RAG indexing, LangChain for agents and orchestration:

# LlamaIndex for indexing from llama_index.core import VectorStoreIndex index = VectorStoreIndex.from_documents(documents) llama_retriever = index.as_retriever() # Wrap in LangChain Retriever interface from langchain.tools import Tool retriever_tool = Tool( name="document_search", func=lambda q: "\n".join([d.text for d in llama_retriever.retrieve(q)]), description="Search documents" ) # Use in LangChain agent agent = create_react_agent(model, [retriever_tool, ...])

Recommendation: Start with LangChain for general work. If RAG becomes complex, add LlamaIndex for indexing. If agents get complex, migrate to LangGraph. Most production systems use LangChain + LangSmith as the core stack.

Table of Contents

What is LangChain

Core Abstractions

LangChain Expression Language (LCEL)

RAG with LangChain

Agents with LangChain

LangSmith Integration

LangChain vs Alternatives

LangChain in Production

LangChain

Table of Contents

What is LangChain

Core Abstractions

LangChain Expression Language (LCEL)

RAG with LangChain

Agents with LangChain

LangSmith Integration

LangChain vs Alternatives

LangChain in Production

Related concepts