Agents & Orchestration

LangChain

A composable framework for building production LLM applications with chains, agents, retrievers, and memory.

2022
Founded
90k+
GitHub Stars
LCEL
Modern Interface

Table of Contents

SECTION 01

What is LangChain

LangChain is an open-source Python framework for building applications with large language models. Founded in 2022 by Harrison Chase, it has become the de facto standard for LLM orchestration, with over 90k GitHub stars and widespread adoption in startups and enterprises.

Core Mission: "Take LLMs from proof-of-concept to production." LangChain abstracts away boilerplate, handles API management, and provides reusable components for common patterns (RAG, agents, memory, evaluation).

Key Components

Why Use LangChain?

Philosophy: LangChain is "glue" code. It standardizes interfaces so you can compose models, tools, and data sources. The magic is in simplification and reusability.
SECTION 02

Core Abstractions

LangChain provides unified interfaces for common components:

1. LLMs & ChatModels

Two model types with a consistent interface:

from langchain_openai import ChatOpenAI from langchain_anthropic import ChatAnthropic # All inherit same interface model = ChatOpenAI(model="gpt-4") # or model = ChatAnthropic(model="claude-3-5-sonnet-20241022") # Same code works for both response = model.invoke([ {"role": "system", "content": "You are a helpful assistant"}, {"role": "user", "content": "What is 2+2?"} ]) print(response.content) # "4"

2. PromptTemplates

Parameterized prompts. Define once, reuse with different variables:

from langchain.prompts import ChatPromptTemplate template = ChatPromptTemplate.from_messages([ ("system", "You are a {profession}."), ("user", "Answer this question: {question}") ]) # Reuse prompt_scientist = template.format_messages( profession="scientist", question="What is photosynthesis?" ) # β†’ 2 messages: system=scientist, user=photosynthesis question prompt_lawyer = template.format_messages( profession="lawyer", question="What is a contract?" ) # β†’ 2 messages: system=lawyer, user=contract question

3. OutputParsers

Parse LLM output (JSON, lists, structured data):

from langchain.output_parsers import PydanticOutputParser from pydantic import BaseModel class SentimentResponse(BaseModel): sentiment: str # "positive", "negative", "neutral" confidence: float # 0-1 explanation: str parser = PydanticOutputParser(pydantic_object=SentimentResponse) # Use in prompt prompt = ChatPromptTemplate.from_template( "Rate sentiment of: {text}\n{format_instructions}" ).partial(format_instructions=parser.get_format_instructions()) # Parse output automatically response = model.invoke(prompt.format(text="I love this!")) parsed = parser.parse(response.content) # β†’ SentimentResponse(sentiment="positive", confidence=0.95, ...)

4. Retrievers

Abstract interface for document retrieval. Swap vector stores without changing code:

5. Tools

Functions agents can use. Defined with type hints and descriptions:

from langchain.tools import tool @tool def calculate_area(radius: float) -> float: """Calculate area of a circle given radius.""" import math return math.pi * radius ** 2 # Tool has name, description, input schema from signature # Agents discover this metadata and decide when to call agent.invoke( "What's the area of a circle with radius 5?" ) # Agent: "I should use calculate_area tool" # β†’ 78.54...
Abstraction Win: All components follow the Runnable interface. They all have `.invoke()`, `.stream()`, `.batch()`. This consistency is powerfulβ€”compose chains without learning different APIs.
SECTION 03

LangChain Expression Language (LCEL)

LCEL (LangChain Expression Language) is a modern, elegant way to compose chains using the pipe operator (`|`).

Basic Syntax

from langchain_openai import ChatOpenAI from langchain.prompts import ChatPromptTemplate # Define components prompt = ChatPromptTemplate.from_template("Translate to {language}: {text}") model = ChatOpenAI() output_parser = StrOutputParser() # Compose with pipe operator chain = prompt | model | output_parser # Run result = chain.invoke({ "language": "Spanish", "text": "Hello, world!" }) # β†’ "Β‘Hola, mundo!" # Equivalent to: # result = output_parser.parse(model.invoke(prompt.format(...)))

Key Features

Advanced: Parallel Execution

from langchain.schema.runnable import RunnableParallel # Run multiple branches in parallel parallel_chain = RunnableParallel( sentiment=sentiment_analyzer, entities=entity_extractor, summary=summarizer ) result = parallel_chain.invoke(text) # β†’ {"sentiment": "...", "entities": [...], "summary": "..."} # All run in parallel!

Streaming

All LCEL chains support streaming by default, enabling real-time response:

chain = prompt | model | output_parser # Stream tokens as they arrive for chunk in chain.stream({"query": "Explain quantum computing"}): print(chunk, end="", flush=True) # Print progressively
LCEL Advantage: Much more readable than nested function calls. `A | B | C` vs `C(B(A(...)))`. Plus, built-in support for streaming, batching, async.
SECTION 04

RAG with LangChain

Building a production RAG (Retrieval-Augmented Generation) system is straightforward with LangChain:

# RAG Pipeline with LangChain from langchain_community.document_loaders import PyPDFLoader from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain_openai import OpenAIEmbeddings from langchain_community.vectorstores import Chroma from langchain_openai import ChatOpenAI from langchain.prompts import ChatPromptTemplate from langchain.schema.runnable import RunnablePassthrough # Step 1: Load documents loader = PyPDFLoader("quantum.pdf") docs = loader.load() # Step 2: Split into chunks splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200 ) chunks = splitter.split_documents(docs) # Step 3: Create embeddings & vector store embeddings = OpenAIEmbeddings() vectorstore = Chroma.from_documents(chunks, embeddings) retriever = vectorstore.as_retriever(search_kwargs={"k": 5}) # Step 4: Create RAG chain prompt = ChatPromptTemplate.from_template( "Based on these docs: {context}\n\nAnswer: {question}" ) model = ChatOpenAI() # Compose RAG chain rag_chain = ( { "context": retriever, # Retriever is a Runnable "question": RunnablePassthrough() # Pass through the question } | prompt | model ) # Step 5: Run answer = rag_chain.invoke("What is quantum entanglement?") print(answer.content)

Explanation

Adding Chat History (Memory)

from langchain.memory import ConversationBufferMemory from langchain.chains import ConversationRetrievalChain memory = ConversationBufferMemory( memory_key="chat_history", return_messages=True ) # ConversationRetrievalChain handles memory automatically chain = ConversationRetrievalChain.from_llm( llm=model, retriever=retriever, memory=memory ) # First turn response1 = chain.invoke({"question": "What is quantum?"}) # Second turn (remembers "quantum" context) response2 = chain.invoke({"question": "And how does it relate to computing?"}) # Memory enables multi-turn coherence
RAG Best Practice: Always chunk documents and include overlap (e.g., 1000 chars, 200 overlap). This prevents semantically important boundaries from being split mid-concept.
SECTION 05

Agents with LangChain

Agents combine LLMs with tools, enabling dynamic reasoning and multi-step problem solving:

from langchain.agents import create_react_agent, AgentExecutor from langchain_openai import ChatOpenAI from langchain.tools import Tool import requests # Define tools @tool def search_wikipedia(query: str) -> str: """Search Wikipedia for information.""" # Implement Wikipedia API call pass @tool def get_weather(location: str) -> str: """Get current weather for a location.""" # Implement weather API call pass tools = [search_wikipedia, get_weather] # Create ReAct agent (Reasoning + Acting) model = ChatOpenAI(model="gpt-4") agent = create_react_agent(model, tools, prompt_template) # Execute agent executor = AgentExecutor(agent=agent, tools=tools, verbose=True) result = executor.invoke({ "input": "What's the weather like in Paris, and what's famous there?" }) # Agent decides: "I need weather + Wikipedia search" # β†’ Calls both tools iteratively # β†’ Synthesizes answer

Agent Loop

Structured Output Agents

For deterministic parsing, use structured agents that output JSON:

from langchain.agents import create_structured_chat_agent # Agent outputs JSON actions, not text agent = create_structured_chat_agent( llm=model, tools=tools, prompt=prompt, output_parser=JsonOutputParser() ) # Guaranteed valid JSON, easier to parse
Agent Tip: Agents are powerful but can hallucinate tool calls or get stuck in loops. Use verbose=True to debug. Limit max_iterations to prevent runaway loops. Provide clear tool descriptions.
SECTION 06

LangSmith Integration

LangSmith is LangChain's production platform for tracing, evaluating, and monitoring LLM applications.

Features

Setup

import os from langchain.callbacks import LangChainTracer # Set API key os.environ["LANGCHAIN_API_KEY"] = "your-api-key" os.environ["LANGCHAIN_TRACING_V2"] = "true" os.environ["LANGCHAIN_PROJECT"] = "my-rag-app" # All LangChain calls are automatically traced # View in https://smith.langchain.com/ chain.invoke({"query": "..."}) # Trace appears in LangSmith dashboard

Evaluation in LangSmith

from langsmith import evaluate # Define evaluators def check_answer_length(output: dict) -> bool: """Evaluator: answer should be < 200 words.""" return len(output["answer"].split()) < 200 def check_factuality(output: dict) -> bool: """Evaluator: use LLM judge for factuality.""" judge = ChatOpenAI(model="gpt-4") eval_prompt = f"""Is this answer factually correct? Output: {output['answer']} Return: yes/no""" result = judge.invoke(eval_prompt) return "yes" in result.lower() # Run evaluation results = evaluate( lambda inputs: rag_chain.invoke(inputs), data=[ {"question": "What is photosynthesis?", "expected": "..."}, {"question": "How does DNA work?", "expected": "..."} ], evaluators=[check_answer_length, check_factuality], experiment_prefix="rag-v1" ) # Results in LangSmith, compare to previous versions

LangSmith Dashboard Insights

LangSmith Cost: Free tier includes 100 traces/month. Paid plans start at $100/month. Worth it for production monitoring but unnecessary for hobby projects.
SECTION 07

LangChain vs Alternatives

LangChain dominates but isn't alone. Comparison with key alternatives:

Framework Best For Strengths Weaknesses
LangChain General-purpose LLM apps, RAG, agents Massive ecosystem, LCEL elegance, production-ready (LangSmith) Can be verbose; large codebase; frequent API changes
LlamaIndex RAG, document indexing, data connectors Best-in-class RAG. 100+ data connectors. Auto-indexing strategies Less focused on agents. Smaller ecosystem
LangGraph Complex workflows, agents with memory Explicit control flow. Graph-based reasoning. Built by LangChain team Newer, fewer examples. Learning curve for graph thinking
Pydantic AI Type-safe agents, structured outputs Strong typing. Structured validation. Clean API Newer, less mature. Smaller community
Raw SDK (openai, anthropic) Simple scripts, full control Lightweight, no dependencies. Direct model access Manual chain management, boilerplate, no abstractions

When to Use Each

Use LangChain if: βœ“ Building a production RAG or agent system βœ“ Need integrations with many vector stores, databases βœ“ Want LangSmith monitoring βœ“ Team is already familiar with it (most common in industry) Use LlamaIndex if: βœ“ Focus is RAG and document indexing βœ“ Need auto-indexing strategies (hierarchical, hybrid, etc.) βœ“ Working with semi-structured data (Notion, Google Docs) Use LangGraph if: βœ“ Complex multi-turn workflows with explicit control βœ“ Need shared state across agent loops βœ“ Building stateful applications with memory Use raw SDK if: βœ“ Simple script or MVP βœ“ Full control is priority over convenience βœ“ Want zero dependencies

LangChain + LlamaIndex Hybrid

In practice, many teams use both: LlamaIndex for RAG indexing, LangChain for agents and orchestration:

# LlamaIndex for indexing from llama_index.core import VectorStoreIndex index = VectorStoreIndex.from_documents(documents) llama_retriever = index.as_retriever() # Wrap in LangChain Retriever interface from langchain.tools import Tool retriever_tool = Tool( name="document_search", func=lambda q: "\n".join([d.text for d in llama_retriever.retrieve(q)]), description="Search documents" ) # Use in LangChain agent agent = create_react_agent(model, [retriever_tool, ...])
Recommendation: Start with LangChain for general work. If RAG becomes complex, add LlamaIndex for indexing. If agents get complex, migrate to LangGraph. Most production systems use LangChain + LangSmith as the core stack.
SECTION 08

LangChain in Production

LangChain's modular design makes it easy to prototype, but the same modularity can cause issues in production if not managed carefully. The most common pain point is callback propagation: callbacks registered at the chain level do not automatically propagate to sub-chains unless you explicitly pass them through. Always use RunnableConfig to pass callbacks and metadata, not positional constructor arguments, when building production pipelines.

For observability, enable LangSmith tracing in all production environments β€” it captures every input/output pair in a chain run, making debugging failures in multi-hop pipelines tractable. Set LANGCHAIN_TRACING_V2=true and LANGCHAIN_PROJECT to your project name. Use trace tags to segment production vs evaluation traffic so you can run cost and quality reports per environment.

Versioning is another production concern: LangChain releases frequently and breaking changes are common. Pin your dependency to a specific minor version (langchain==0.3.x) and test upgrades on a shadow pipeline before promoting. The LangChain Expression Language (LCEL) is more stable than the legacy chain API β€” if you are still using LLMChain or SequentialChain, migrate to LCEL before scaling; the legacy API is in maintenance mode.