A retrieval augmented generation approach that uses knowledge graphs and hierarchical community detection to capture cross-document relationships and enable sophisticated multi-hop reasoning over large document collections.
Standard RAG chunksthe documents, embeds each chunk independently, retrieves the k most similar chunks for a query, and passes them to an LLM for synthesis. This works well for factual lookup—"What is the capital of France?"—but struggles with complex, cross-document reasoning—"What are the relationships between Company A's executives and Company B's competitors, across all 500 documents?" because the relevant information is scattered across many chunks, and flat semantic similarity cannot capture the interconnected structure.
GraphRAG solves this by building a knowledge graph of entities (persons, organizations, concepts) and their relationships (works_at, competes_with, founded). This graph preserves the semantic structure of the corpus, making it possible to answer questions that require connecting facts across documents. Furthermore, GraphRAG applies community detection algorithms (like the Leiden algorithm) to identify clusters of tightly connected entities, and generates summaries for each community. During retrieval, both local (entity-centric) and global (community-level) search modes are available, enabling better coverage of relevant context.
In Microsoft's benchmarks, GraphRAG achieves 40% better accuracy on multi-hop questions and 15% on single-hop questions compared to standard dense retrieval RAG, with lower hallucination rates.
GraphRAG consists of two phases: indexing (offline) and querying (online).
Indexing Phase: (1) Split documents into chunks. (2) For each chunk, extract entities (named entity recognition) and relationships (using an LLM or extractor). (3) Construct a knowledge graph: nodes are entities, edges are relationships with metadata. (4) Apply community detection (Leiden algorithm) to partition the graph into hierarchical levels. (5) For each community, generate a summary (using an LLM). (6) Store the graph, communities, and summaries in a database.
Query Phase: (1) Parse the user's query. (2) Extract query entities and determine the appropriate search mode (local or global). (3) Local search: find entities mentioned in the query, retrieve their 1-hop and 2-hop neighbors in the graph, and collect their summaries. (4) Global search: retrieve summaries of relevant communities based on the query topic. (5) Aggregate retrieved summaries into context. (6) Pass context to the LLM for synthesis.
The key innovation: instead of embedding and matching raw text, the system works with structured graph relationships and hierarchical communities. This preserves meaning across large document spans and enables more sophisticated retrieval logic.
Step 1: Document Chunking
Split documents into overlapping chunks (e.g., 1200 tokens with 100-token overlap) to preserve local context and enable entity references that span chunks.
Step 2: Entity Extraction
For each chunk, use an LLM (e.g., GPT-4 with few-shot examples) to extract entities and their types (PERSON, ORGANIZATION, LOCATION, CONCEPT). The LLM is prompted: "Extract all entities of type [type] from this text." Named Entity Recognition (NER) models can also be used for faster, lighter extraction, though LLMs achieve better precision in multi-hop relationships.
Step 3: Relationship Extraction
For each chunk, prompt an LLM to extract relationships: "What relationships exist between the extracted entities? List them as (source, relationship_type, target)." Examples: (Alice, WORKS_AT, Acme Corp), (Acme Corp, COMPETES_WITH, TechCorp). This produces a list of edges for the graph.
Step 4: Graph Construction
Combine entity and relationship data across all chunks into a single knowledge graph. Nodes are deduplicated by entity name (with optional fuzzy matching to handle synonyms). Edges are weighted by frequency (if an entity relationship appears in 5 chunks, the edge weight is higher). The result is a potentially large graph (thousands to millions of nodes for large corpora).
Step 5: Community Detection
Apply the Leiden algorithm (similar to Louvain, but with better quality and stability) to partition the graph into communities. This produces hierarchical levels where each node belongs to a hierarchy of communities. For example, Alice Chen belongs to Community_Level0_5, which belongs to Community_Level1_2, etc.
Step 6: Community Summarization
For each community (at each level), generate a summary: prompt an LLM with the entities and relationships in that community and ask for a natural-language summary. Example: "Here are the entities and relationships in a community: [list]. Summarize the key themes and relationships in 2-3 sentences." Store these summaries in the database.
Step 7: Storage & Indexing
Store the graph, community assignments, summaries, and chunks in a database (e.g., Neo4j for the graph, PostgreSQL for metadata). Create indices on entity names and community IDs for fast lookup during querying.
Local Search: Focused on entities mentioned in the query. The system retrieves the query entities from the graph, finds their immediate neighbors (1-hop and 2-hop relationships), and collects summaries for those entities and their communities. Local search excels at answering entity-specific questions and understanding relationships around known entities. Example: "What are Alice Chen's roles and relationships?" Local search finds Alice, retrieves her connections (works_at, worked_at, etc.), and synthesizes the response.
Global Search: Answers questions about overall themes or patterns in the corpus without relying on specific entities. The system retrieves relevant community summaries based on semantic similarity to the query. For example, "What are the main competitors in the market?" Global search retrieves communities related to competition, companies, and markets, synthesizes their summaries, and generates an answer. Global search is better for exploratory questions and discovering patterns.
Drift Search: A hybrid mode where local and global results are blended. Start with local entity-centric results, then augment with global community summaries if coverage is low.
Microsoft's GraphRAG library (https://github.com/microsoft/graphrag) is open-source and designed for easy integration. Here's a typical workflow:
Configuration Options: LLM choice (OpenAI, local Llama, etc.), chunking strategy, entity/relationship types to extract, community detection parameters (algorithm, number of levels), and summarization batch size. Larger corpora benefit from more hierarchy levels and finer-grained community detection.
Deployment: Index once (can take hours for large corpora), then serve queries via an API. Cache community summaries to speed up global search. For very large graphs (millions of nodes), consider sharding by topic or document collection.
| Aspect | Standard RAG | GraphRAG |
|---|---|---|
| Retrieval Mechanism | Semantic similarity (embedding-based) | Graph relationships + community summaries |
| Multi-Hop Q&A | Struggles; requires lucky chunk overlap | Strong; follows graph paths naturally |
| Hallucination Rate | Higher; less grounded in structure | Lower; grounded in explicit graph |
| Coverage on Complex Questions | Lower; limited context per query | Higher; retrieves all related communities |
| Indexing Cost | Low; just chunk and embed | High; LLM-based extraction and summarization |
| Query Speed | Fast; vector similarity search | Medium; graph traversal + LLM synthesis |
| Interpretability | Low; black-box embeddings | High; explicit entities and relationships |
| Domain Adaptation | Fine-tune embeddings | Define entity/relationship types and taxonomies |
Cost-Benefit Analysis: GraphRAG's higher indexing cost (LLM extraction and summarization) is justified for corpora where multi-hop reasoning and coverage are critical (e.g., legal documents, scientific literature, enterprise knowledge bases). For simple fact lookup or small corpora, standard RAG is faster and cheaper. Many organizations use GraphRAG as a second-pass system: standard RAG retrieves initial candidates, then GraphRAG refines with graph-based reasoning.
GraphRAG is ideal when: (1) Your queries require multi-hop reasoning across documents (e.g., "What is the relationship between entities in document A and entities in document B?"). (2) The corpus is large and diverse, with interconnected entities and relationships. (3) Accuracy and low hallucination are critical. (4) Interpretability matters—stakeholders want to know why a particular answer was retrieved. (5) You have the budget for LLM-based indexing (entity and relationship extraction).
Standard RAG is sufficient when: (1) Queries are factual lookups or single-document questions. (2) The corpus is small or homogeneous. (3) Speed and cost are paramount. (4) Chunk-level retrieval provides sufficient context. (5) Your embedding model is well-aligned with your domain.
Concrete Use Cases:
GraphRAG excels in: legal document analysis (cross-referencing contracts, precedents, parties), scientific literature synthesis (connecting research across papers and labs), enterprise knowledge management (connecting people, projects, and organizational relationships), competitive intelligence (tracking company relationships, partnerships, market dynamics), and investigative journalism (finding connections between actors, events, and organizations).
Implementation Timeline: Standard RAG (1-2 days to implement and deploy). GraphRAG (2-4 weeks: design entity/relationship schema, extract and build graph, tune community detection, validate results). Budget accordingly.
Microsoft GraphRAG's indexing pipeline is expensive by design — it runs community detection (Leiden algorithm), entity extraction via LLM, and relationship summarisation on the full corpus. For a 10 M-token corpus, expect 4–8 hours of indexing time and $40–80 in API costs at Haiku pricing. To reduce this: (1) chunk aggressively (512 tokens is usually sufficient for entity extraction); (2) use a cheaper model for entity extraction passes and only run the expensive summarisation on top-level community reports; (3) filter the input corpus to the documents most relevant to anticipated query types — GraphRAG scales quadratically in entity count, not linearly.
For query latency, global search is slower than local search because it aggregates community reports across the entire graph. Profile your top query patterns and pre-cache community report summaries for the most common anchor entities. Entity resolution is a hidden bottleneck: if the same concept appears under multiple surface forms (e.g. "GPT-4", "GPT4", "OpenAI GPT-4"), the graph will fragment rather than consolidate. Run a deduplication pass on entity names before building the graph — even a simple fuzzy-match threshold of 0.85 cosine similarity dramatically reduces fragmentation and improves global search quality.