Architecture · Tooling

LLM Development Frameworks

LangChain, LlamaIndex, Haystack, and Semantic Kernel — when to use each and what they actually give you

6 frameworks
7 sections
Python-first SDKs
Contents
  1. Framework landscape
  2. Framework comparison
  3. LangChain deep-dive
  4. LlamaIndex deep-dive
  5. Haystack deep-dive
  6. When to use what
  7. References
01 — Overview

Framework Landscape

LLM development frameworks abstract away boilerplate. They provide building blocks: chains (sequential operations), prompts (templates), retrievers (vector search), agents (autonomous decision-making), and memory (conversation context). Each framework takes a different architectural approach, leading to different tradeoffs in simplicity, flexibility, and observability.

Why Frameworks Exist

Raw LLM APIs are low-level. Building a chat application requires: prompt template management, context window management, tool invocation loops, error handling, retries, and logging. Frameworks handle this. But they also impose patterns—some more opinionated than others.

💡 Key decision: Do you want simplicity (more opinionated framework) or flexibility (less opinionated)? LangChain leans opinionated; LlamaIndex leans flexible.
02 — Comparison Matrix

Framework Comparison

Framework Primary Use RAG Support Agents Streaming Community
LangChain General-purpose chains Excellent Strong (ReAct) Largest
LlamaIndex RAG-first indexing Outstanding Good Large
Haystack NLP pipelines Excellent Emerging Medium
Semantic Kernel .NET integration Good Good Growing
LangGraph State machines Good Excellent Emerging
DSPy Program optimization Good Good Research
03 — Most Popular

LangChain Deep-Dive

LangChain is the largest and most opinionated framework. It pioneered the "chain" abstraction: composable components linked together. Core concepts: Chains (sequences), Runnables (modern execution model), Memory (conversation state), Tools (function calling), and Callbacks (observability).

LangChain Core Concepts

⛓️ LCEL Chains

  • Composable units of work
  • Pipe operator for chaining
  • Streaming + batching built-in

🧠 Memory Management

  • Conversation history tracking
  • Context window awareness
  • Retrieval augmented context

🔧 Tools & Agents

📊 LangSmith

Example: Simple Chain

from langchain.llms import OpenAI from langchain.prompts import PromptTemplate from langchain.chains import LLMChain llm = OpenAI(temperature=0.7) prompt = PromptTemplate( input_variables=["topic"], template="Write a haiku about {topic}" ) chain = LLMChain(llm=llm, prompt=prompt) result = chain.run(topic="machine learning") print(result)

Pros & Cons

04 — RAG-First

LlamaIndex Deep-Dive

LlamaIndex is purpose-built for RAG. Concepts: Data Loaders (ingest), Indices (organize), Query Engines (retrieve + rerank), and Retrievers (vector search). Focused on data indexing and retrieval quality.

LlamaIndex Architecture

📥 Data Loaders

  • Load from 100+ sources
  • SimpleDirectoryReader
  • Custom loader support

📑 Indices

  • VectorStoreIndex (embedding-based)
  • TreeIndex (hierarchical)
  • SummaryIndex (flat)

🔍 Query Engines

🤖 Agents

Example: RAG Pipeline

from llama_index import SimpleDirectoryReader, VectorStoreIndex # Load documents documents = SimpleDirectoryReader("./data").load_data() # Create index index = VectorStoreIndex.from_documents(documents) # Query query_engine = index.as_query_engine() response = query_engine.query("What is RAG?") print(response)print(response)

Pros & Cons

05 — NLP-First

Haystack Deep-Dive

Haystack (by deepset) is built on NLP-first thinking. Concepts: Pipelines (YAML-configurable workflows), Components (pluggable NLP nodes), Document Stores (retrieval), and Evaluators (metrics). Designed for production NLP systems.

When to Use Haystack

Haystack Pipeline Example

from haystack import Pipeline from haystack.components.retrievers.in_memory import InMemoryBM25Retriever from haystack.components.rankers import TransformersSimilarityRanker # Build pipeline pipeline = Pipeline() pipeline.add_component("retriever", InMemoryBM25Retriever()) pipeline.add_component("ranker", TransformersSimilarityRanker()) # Connect components pipeline.connect("retriever.documents", "ranker.documents") # Run result = pipeline.run({"retriever": {"query": "machine learning"}}) print(result)
06 — Decision Framework

When to Use What

Scenario Best Choice Why
Simple RAG (docs → search → LLM) LlamaIndex Purpose-built, minimal overhead
Complex agents (multi-tool, memory) LangChain Strongest agent framework
Enterprise NLP pipeline Haystack Production-grade, evaluation tools
.NET/C# application Semantic Kernel Native .NET integration
Agentic state machine LangGraph Explicit state control
Prompt optimization DSPy Automated prompt engineering

Migration Paths

Start simple with LlamaIndex for RAG, graduate to LangChain if you need agents. Both work together—use LlamaIndex's query engine as a tool in LangChain agents.

⚠️ Avoid framework hopping: All frameworks are viable. Pick one, go deep, and switch only if you hit real constraints (not theoretical ones).
Tools & SDKs

Framework & Tool Ecosystem

Framework
LangChain
General-purpose LLM application framework with chains, agents, memory
Framework
LlamaIndex
RAG-focused indexing and retrieval framework
Framework
Haystack
NLP-first pipeline and component framework
Framework
Semantic Kernel
Microsoft's .NET LLM framework
Agentic
LangGraph
State machine framework for agents (LangChain layer)
Optimization
DSPy
Automatic prompt optimization framework
Observability
LangSmith
Debugging, evaluation, monitoring (LangChain)
Vector DB
Pinecone
Managed vector database (works with all frameworks)