Retrieval & RAG

GraphRAG Overview

A retrieval augmented generation approach that uses knowledge graphs and hierarchical community detection to capture cross-document relationships and enable sophisticated multi-hop reasoning over large document collections.

2024
Microsoft Release
Entities + Relations
Knowledge Graph
Global + Local
Query Modes

Table of Contents

  1. Why GraphRAG
  2. How GraphRAG Works
  3. Indexing Pipeline
  4. Query Modes
  1. Implementation with Microsoft
  2. GraphRAG vs Standard RAG
  3. When to Use GraphRAG
SECTION 01

Why GraphRAG

Standard RAG chunksthe documents, embeds each chunk independently, retrieves the k most similar chunks for a query, and passes them to an LLM for synthesis. This works well for factual lookup—"What is the capital of France?"—but struggles with complex, cross-document reasoning—"What are the relationships between Company A's executives and Company B's competitors, across all 500 documents?" because the relevant information is scattered across many chunks, and flat semantic similarity cannot capture the interconnected structure.

GraphRAG solves this by building a knowledge graph of entities (persons, organizations, concepts) and their relationships (works_at, competes_with, founded). This graph preserves the semantic structure of the corpus, making it possible to answer questions that require connecting facts across documents. Furthermore, GraphRAG applies community detection algorithms (like the Leiden algorithm) to identify clusters of tightly connected entities, and generates summaries for each community. During retrieval, both local (entity-centric) and global (community-level) search modes are available, enabling better coverage of relevant context.

In Microsoft's benchmarks, GraphRAG achieves 40% better accuracy on multi-hop questions and 15% on single-hop questions compared to standard dense retrieval RAG, with lower hallucination rates.

Core Insight Knowledge graphs + hierarchical community detection enable "retrieval by relationship" rather than "retrieval by similarity." The result: better coverage of relevant context, stronger support for multi-hop reasoning, and fewer hallucinations.
SECTION 02

How GraphRAG Works

GraphRAG consists of two phases: indexing (offline) and querying (online).

Indexing Phase: (1) Split documents into chunks. (2) For each chunk, extract entities (named entity recognition) and relationships (using an LLM or extractor). (3) Construct a knowledge graph: nodes are entities, edges are relationships with metadata. (4) Apply community detection (Leiden algorithm) to partition the graph into hierarchical levels. (5) For each community, generate a summary (using an LLM). (6) Store the graph, communities, and summaries in a database.

Query Phase: (1) Parse the user's query. (2) Extract query entities and determine the appropriate search mode (local or global). (3) Local search: find entities mentioned in the query, retrieve their 1-hop and 2-hop neighbors in the graph, and collect their summaries. (4) Global search: retrieve summaries of relevant communities based on the query topic. (5) Aggregate retrieved summaries into context. (6) Pass context to the LLM for synthesis.

The key innovation: instead of embedding and matching raw text, the system works with structured graph relationships and hierarchical communities. This preserves meaning across large document spans and enables more sophisticated retrieval logic.

Hierarchy Levels The Leiden algorithm produces hierarchical communities: Level 0 (finest) has small, tightly-knit clusters; Level 1 has larger communities (collections of Level 0 clusters); etc. Queries can search at the appropriate level—fine-grained for specific entity relationships, coarse-grained for high-level topic discovery.
SECTION 03

Indexing Pipeline in Depth

Step 1: Document Chunking
Split documents into overlapping chunks (e.g., 1200 tokens with 100-token overlap) to preserve local context and enable entity references that span chunks.

Step 2: Entity Extraction
For each chunk, use an LLM (e.g., GPT-4 with few-shot examples) to extract entities and their types (PERSON, ORGANIZATION, LOCATION, CONCEPT). The LLM is prompted: "Extract all entities of type [type] from this text." Named Entity Recognition (NER) models can also be used for faster, lighter extraction, though LLMs achieve better precision in multi-hop relationships.

Step 3: Relationship Extraction
For each chunk, prompt an LLM to extract relationships: "What relationships exist between the extracted entities? List them as (source, relationship_type, target)." Examples: (Alice, WORKS_AT, Acme Corp), (Acme Corp, COMPETES_WITH, TechCorp). This produces a list of edges for the graph.

Step 4: Graph Construction
Combine entity and relationship data across all chunks into a single knowledge graph. Nodes are deduplicated by entity name (with optional fuzzy matching to handle synonyms). Edges are weighted by frequency (if an entity relationship appears in 5 chunks, the edge weight is higher). The result is a potentially large graph (thousands to millions of nodes for large corpora).

Entity and Relationship Extraction Prompt: You are an expert at extracting entities and relationships from text. TASK: Extract all entities and relationships from the following text. ENTITIES: Extract entities and classify them as PERSON, ORGANIZATION, LOCATION, CONCEPT, or PRODUCT. RELATIONSHIPS: Extract relationships in the form (source_entity, relationship_type, target_entity). Relationship types include: WORKS_AT, FOUNDED, COMPETES_WITH, OWNS, SUPPLIES, LOCATED_IN, etc. TEXT: --- Alice Chen is the Chief Technology Officer at TechCorp. TechCorp, founded in 2015, develops AI software and competes directly with InnovateLabs, which is based in Silicon Valley. Alice previously worked at CloudSystems. --- OUTPUT (JSON): { "entities": [ {"name": "Alice Chen", "type": "PERSON"}, {"name": "TechCorp", "type": "ORGANIZATION"}, {"name": "InnovateLabs", "type": "ORGANIZATION"}, {"name": "CloudSystems", "type": "ORGANIZATION"}, {"name": "Silicon Valley", "type": "LOCATION"} ], "relationships": [ {"source": "Alice Chen", "relationship": "WORKS_AT", "target": "TechCorp"}, {"source": "TechCorp", "relationship": "FOUNDED", "target": "2015"}, {"source": "TechCorp", "relationship": "COMPETES_WITH", "target": "InnovateLabs"}, {"source": "InnovateLabs", "relationship": "LOCATED_IN", "target": "Silicon Valley"}, {"source": "Alice Chen", "relationship": "PREVIOUSLY_WORKED_AT", "target": "CloudSystems"} ] }

Step 5: Community Detection
Apply the Leiden algorithm (similar to Louvain, but with better quality and stability) to partition the graph into communities. This produces hierarchical levels where each node belongs to a hierarchy of communities. For example, Alice Chen belongs to Community_Level0_5, which belongs to Community_Level1_2, etc.

Step 6: Community Summarization
For each community (at each level), generate a summary: prompt an LLM with the entities and relationships in that community and ask for a natural-language summary. Example: "Here are the entities and relationships in a community: [list]. Summarize the key themes and relationships in 2-3 sentences." Store these summaries in the database.

Step 7: Storage & Indexing
Store the graph, community assignments, summaries, and chunks in a database (e.g., Neo4j for the graph, PostgreSQL for metadata). Create indices on entity names and community IDs for fast lookup during querying.

LLM Costs in Indexing Entity and relationship extraction using LLMs is expensive: for a 1M-token corpus, expect thousands of LLM API calls. Consider using smaller, faster models for extraction (e.g., Llama 2 locally) and reserve expensive models for community summarization.
SECTION 04

Query Modes: Local & Global Search

Local Search: Focused on entities mentioned in the query. The system retrieves the query entities from the graph, finds their immediate neighbors (1-hop and 2-hop relationships), and collects summaries for those entities and their communities. Local search excels at answering entity-specific questions and understanding relationships around known entities. Example: "What are Alice Chen's roles and relationships?" Local search finds Alice, retrieves her connections (works_at, worked_at, etc.), and synthesizes the response.

Global Search: Answers questions about overall themes or patterns in the corpus without relying on specific entities. The system retrieves relevant community summaries based on semantic similarity to the query. For example, "What are the main competitors in the market?" Global search retrieves communities related to competition, companies, and markets, synthesizes their summaries, and generates an answer. Global search is better for exploratory questions and discovering patterns.

Drift Search: A hybrid mode where local and global results are blended. Start with local entity-centric results, then augment with global community summaries if coverage is low.

Query Processing with Local/Global: Query: "What companies has Alice Chen worked for?" → Local Search 1. Extract entities: Alice Chen 2. Find Alice in graph 3. Retrieve WORKS_AT and PREVIOUSLY_WORKED_AT edges 4. Collect: TechCorp, CloudSystems 5. Retrieve summaries for these entities & related communities 6. LLM synthesizes: "Alice Chen has worked at..." --- Query: "What market trends are affecting tech companies?" → Global Search 1. Extract query intent: trends, market, tech companies 2. Retrieve communities related to: market, trends, technology 3. Collect summaries of relevant communities 4. LLM synthesizes: "Key market trends include..."
Search Mode Selection GraphRAG can automatically select search mode based on query structure. Queries with named entities → local. Abstract, thematic queries → global. The implementation can also return results from both modes and rank them.
SECTION 05

Implementation with Microsoft GraphRAG

Microsoft's GraphRAG library (https://github.com/microsoft/graphrag) is open-source and designed for easy integration. Here's a typical workflow:

GraphRAG Python Setup: from graphrag.index import create_index from graphrag.query import query_local, query_global import asyncio # Configuration config = { "llm": { "type": "openai", "api_key": "sk-...", "model": "gpt-4", }, "chunking": { "size": 1200, "overlap": 100, }, "entity_extraction": { "enabled": True, "types": ["PERSON", "ORGANIZATION", "LOCATION", "CONCEPT"], }, "relationship_extraction": { "enabled": True, }, "community_detection": { "algorithm": "leiden", "levels": 3, # 3 hierarchy levels }, "summarization": { "enabled": True, "batch_size": 10, }, } # Index documents (offline phase) documents = [ {"id": "doc1", "text": "...document 1..."}, {"id": "doc2", "text": "...document 2..."}, # ... more documents ] index = create_index( documents=documents, config=config, storage_path="./graphrag_index" ) # Query documents (online phase) async def query_corpus(): # Local search local_results = await query_local( index=index, query="What companies has Alice worked for?", top_k=3, ) print("Local results:", local_results) # Global search global_results = await query_global( index=index, query="What are market trends affecting tech?", top_k=5, ) print("Global results:", global_results) asyncio.run(query_corpus())

Configuration Options: LLM choice (OpenAI, local Llama, etc.), chunking strategy, entity/relationship types to extract, community detection parameters (algorithm, number of levels), and summarization batch size. Larger corpora benefit from more hierarchy levels and finer-grained community detection.

Deployment: Index once (can take hours for large corpora), then serve queries via an API. Cache community summaries to speed up global search. For very large graphs (millions of nodes), consider sharding by topic or document collection.

Customization You can plug in custom entity extractors, relationship classifiers, or community detection algorithms. For domain-specific knowledge graphs (e.g., scientific papers), fine-tune the entity types and relationship taxonomies.
SECTION 06

GraphRAG vs Standard RAG

Aspect Standard RAG GraphRAG
Retrieval Mechanism Semantic similarity (embedding-based) Graph relationships + community summaries
Multi-Hop Q&A Struggles; requires lucky chunk overlap Strong; follows graph paths naturally
Hallucination Rate Higher; less grounded in structure Lower; grounded in explicit graph
Coverage on Complex Questions Lower; limited context per query Higher; retrieves all related communities
Indexing Cost Low; just chunk and embed High; LLM-based extraction and summarization
Query Speed Fast; vector similarity search Medium; graph traversal + LLM synthesis
Interpretability Low; black-box embeddings High; explicit entities and relationships
Domain Adaptation Fine-tune embeddings Define entity/relationship types and taxonomies

Cost-Benefit Analysis: GraphRAG's higher indexing cost (LLM extraction and summarization) is justified for corpora where multi-hop reasoning and coverage are critical (e.g., legal documents, scientific literature, enterprise knowledge bases). For simple fact lookup or small corpora, standard RAG is faster and cheaper. Many organizations use GraphRAG as a second-pass system: standard RAG retrieves initial candidates, then GraphRAG refines with graph-based reasoning.

Hybrid Approach Combine standard RAG (fast, baseline) with GraphRAG (slower, higher-quality) in a two-stage pipeline. For straightforward queries, return standard RAG results. For complex queries, augment or replace with GraphRAG results. This balances cost and quality.
SECTION 07

When to Use GraphRAG

GraphRAG is ideal when: (1) Your queries require multi-hop reasoning across documents (e.g., "What is the relationship between entities in document A and entities in document B?"). (2) The corpus is large and diverse, with interconnected entities and relationships. (3) Accuracy and low hallucination are critical. (4) Interpretability matters—stakeholders want to know why a particular answer was retrieved. (5) You have the budget for LLM-based indexing (entity and relationship extraction).

Standard RAG is sufficient when: (1) Queries are factual lookups or single-document questions. (2) The corpus is small or homogeneous. (3) Speed and cost are paramount. (4) Chunk-level retrieval provides sufficient context. (5) Your embedding model is well-aligned with your domain.

Concrete Use Cases:
GraphRAG excels in: legal document analysis (cross-referencing contracts, precedents, parties), scientific literature synthesis (connecting research across papers and labs), enterprise knowledge management (connecting people, projects, and organizational relationships), competitive intelligence (tracking company relationships, partnerships, market dynamics), and investigative journalism (finding connections between actors, events, and organizations).

Decision Tree: Question: Do you need multi-hop reasoning? → YES: Consider GraphRAG → Large corpus with many entities/relationships? → YES: Use GraphRAG (full implementation) → NO: Standard RAG + fact verification (hybrid) → NO: Use standard RAG Question: How critical is hallucination reduction? → Critical (legal, medical): Use GraphRAG → Important: Hybrid approach → Less critical: Standard RAG

Implementation Timeline: Standard RAG (1-2 days to implement and deploy). GraphRAG (2-4 weeks: design entity/relationship schema, extract and build graph, tune community detection, validate results). Budget accordingly.

Common Pitfalls Over-extracting entities and relationships (too many nodes/edges makes the graph noisy and slow). Under-specifying entity types (generic "ENTITY" type loses semantic meaning). Insufficient hierarchy levels (makes community summaries too coarse or too fine). Not tuning the LLM prompts for your domain.
SECTION 08

GraphRAG Performance Tuning

Microsoft GraphRAG's indexing pipeline is expensive by design — it runs community detection (Leiden algorithm), entity extraction via LLM, and relationship summarisation on the full corpus. For a 10 M-token corpus, expect 4–8 hours of indexing time and $40–80 in API costs at Haiku pricing. To reduce this: (1) chunk aggressively (512 tokens is usually sufficient for entity extraction); (2) use a cheaper model for entity extraction passes and only run the expensive summarisation on top-level community reports; (3) filter the input corpus to the documents most relevant to anticipated query types — GraphRAG scales quadratically in entity count, not linearly.

For query latency, global search is slower than local search because it aggregates community reports across the entire graph. Profile your top query patterns and pre-cache community report summaries for the most common anchor entities. Entity resolution is a hidden bottleneck: if the same concept appears under multiple surface forms (e.g. "GPT-4", "GPT4", "OpenAI GPT-4"), the graph will fragment rather than consolidate. Run a deduplication pass on entity names before building the graph — even a simple fuzzy-match threshold of 0.85 cosine similarity dramatically reduces fragmentation and improves global search quality.