Entity Memory

What entity memory solves
Entity extraction and tracking
Building an entity store
Entity memory with LangChain
Knowledge graph memory
Merging conflicting entity data
Gotchas

SECTION 01

What entity memory solves

Standard conversation buffers remember what was said. Entity memory remembers facts about things. The distinction matters: in a long conversation, you might mention "our CTO Sarah" in turn 2, "Sarah's team" in turn 8, "the engineering budget" in turn 15. A buffer memory might compress these into a vague summary. Entity memory tracks Sarah as an entity with attributes (role=CTO, team=engineering), so when you reference her 20 turns later, the model has structured, precise context.

Entity memory builds a lightweight knowledge graph from the conversation — a set of entities (nodes) with attributes (properties) and relationships (edges). This is especially valuable for: customer support (track customer's product, issue, history), personal assistants (track user's projects, people, preferences), and coding assistants (track codebase components, dependencies, recent changes).

SECTION 02

Entity extraction and tracking

import anthropic, json

client = anthropic.Anthropic()

def extract_entities(text: str, existing_entities: dict) -> dict:
    '''Extract or update entities from new text.'''
    existing_str = json.dumps(existing_entities, indent=2) if existing_entities else "None yet."
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=512,
        messages=[{"role": "user", "content": f'''Extract entities and their attributes from the text.
Return JSON merging new info with existing entities.
Format: {{"entity_name": {{"type": "person/org/product/...", "attributes": {{...}}}}}}

Existing entities:
{existing_str}

New text: {text}

Updated entities JSON:'''}]
    )
    try:
        return json.loads(response.content[0].text)
    except json.JSONDecodeError:
        return existing_entities  # fallback if parsing fails

# Test
entities = {}
entities = extract_entities("Our CTO Sarah Chen leads a team of 15 engineers.", entities)
entities = extract_entities("Sarah's team is working on the new API project.", entities)
entities = extract_entities("The API project has a Q2 deadline and $500k budget.", entities)
print(json.dumps(entities, indent=2))
# {
#   "Sarah Chen": {"type": "person", "attributes": {"role": "CTO", "team_size": 15}},
#   "API project": {"type": "project", "attributes": {"team": "Sarah Chen's", "deadline": "Q2", "budget": "$500k"}}
# }

SECTION 03

Building an entity store

from dataclasses import dataclass, field
from typing import Any

@dataclass
class Entity:
    name: str
    entity_type: str
    attributes: dict[str, Any] = field(default_factory=dict)
    mentioned_in_turns: list[int] = field(default_factory=list)

    def update(self, new_attrs: dict):
        '''Merge new attributes, overwriting on conflict.'''
        self.attributes.update(new_attrs)

class EntityStore:
    def __init__(self):
        self._entities: dict[str, Entity] = {}

    def upsert(self, name: str, entity_type: str, attributes: dict, turn: int = 0):
        if name not in self._entities:
            self._entities[name] = Entity(name, entity_type, attributes, [turn])
        else:
            self._entities[name].update(attributes)
            self._entities[name].mentioned_in_turns.append(turn)

    def get(self, name: str) -> Entity | None:
        return self._entities.get(name)

    def get_context_for_prompt(self, relevant_entities: list[str]) -> str:
        '''Format entity facts for injection into prompt.'''
        lines = []
        for name in relevant_entities:
            entity = self._entities.get(name)
            if entity:
                attrs = ", ".join(f"{k}: {v}" for k, v in entity.attributes.items())
                lines.append(f"{name} ({entity.entity_type}): {attrs}")
        return "
".join(lines)

# Usage in a conversation
store = EntityStore()
store.upsert("Sarah Chen", "person", {"role": "CTO", "team_size": 15}, turn=2)
store.upsert("API Project", "project", {"lead": "Sarah Chen", "deadline": "Q2"}, turn=5)

# Before generating a response, inject relevant entity context
context = store.get_context_for_prompt(["Sarah Chen", "API Project"])
print(context)

SECTION 04

Entity memory with LangChain

from langchain.memory import ConversationEntityMemory
from langchain_anthropic import ChatAnthropic
from langchain.chains import ConversationChain

llm = ChatAnthropic(model="claude-haiku-4-5-20251001")

# EntityMemory automatically extracts and tracks entities
memory = ConversationEntityMemory(llm=llm)

chain = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True
)

# Entity store is updated automatically after each turn
chain.predict(input="Hi, I'm Deepak. I work on a GenAI project called Mindmap.")
chain.predict(input="My colleague Priya is the ML lead on Mindmap.")
chain.predict(input="What do you know about Mindmap?")
# Model knows: Mindmap is a GenAI project, Deepak works on it, Priya is ML lead

# Inspect the entity store
print(memory.entity_store.store)
# {"Deepak": "Deepak works on a GenAI project called Mindmap.",
#  "Priya": "Priya is the ML lead on Mindmap.",
#  "Mindmap": "Mindmap is a GenAI project..."}

LangChain's ConversationEntityMemory uses a separate LLM call after each turn to extract and update entity summaries. The entity store is injected at the start of the next prompt as structured context.

SECTION 05

Knowledge graph memory

For complex entity relationships, upgrade from a flat attribute store to a proper knowledge graph. LangChain's ConversationKGMemory extracts (subject, predicate, object) triples:

from langchain.memory import ConversationKGMemory
from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(model="claude-haiku-4-5-20251001")
memory = ConversationKGMemory(llm=llm)

memory.save_context(
    {"input": "Sarah Chen is the CTO and leads the API team."},
    {"output": "Got it."}
)
memory.save_context(
    {"input": "The API team is working on the payment service."},
    {"output": "Understood."}
)

# Query the knowledge graph
print(memory.load_memory_variables({"input": "What does Sarah do?"})["history"])
# On Sarah Chen: Sarah Chen is CTO. Sarah Chen leads API team.

# Get triples
print(memory.kg.get_triples())
# [("Sarah Chen", "is", "CTO"),
#  ("Sarah Chen", "leads", "API team"),
#  ("API team", "is working on", "payment service")]

SECTION 06

Merging conflicting entity data

Entities get updated across turns, and updates can contradict earlier information. Handling conflicts correctly is important for accuracy:

def merge_entity_attribute(entity: Entity, key: str, new_value: Any) -> None:
    old_value = entity.attributes.get(key)
    if old_value is None:
        entity.attributes[key] = new_value
        return
    if old_value == new_value:
        return  # No conflict
    # Conflict: old_value != new_value
    # Strategy 1: Always prefer latest (simple)
    entity.attributes[key] = new_value
    entity.attributes[f"{key}_previous"] = old_value  # keep history

    # Strategy 2: Ask LLM to resolve (expensive but accurate)
    # resolution = ask_llm(f"Entity '{entity.name}' has {key}={old_value}. New info says {key}={new_value}. What's correct?")
    # entity.attributes[key] = resolution

For most cases, "latest update wins" is correct — the user corrected earlier information. Keep the old value in a _previous field for debugging and auditability.

SECTION 07

Gotchas

Entity extraction adds latency per turn. Each user message triggers an extra LLM call to extract entities. For high-volume applications, batch entity extraction asynchronously after returning the response, or use a smaller/faster model exclusively for extraction (Haiku for extraction, Sonnet for responses).

Entity proliferation degrades quality. After a long conversation, the entity store might contain dozens of entities, most no longer relevant. Add a relevance threshold: only inject the top N entities most similar to the current query (using embedding similarity), not all of them.

Ambiguous references are hard. "He said the project is delayed" — who is "he"? Which "project"? Coreference resolution is a hard NLP problem. Either restrict your domain (the agent always knows the current "active entity" from context) or add explicit disambiguation logic.

SECTION 08

Entity Memory Architecture Patterns

Storage Backend	Query Type	Best For	Scale Limit
In-memory dict	Exact key lookup	Single session, prototype	~1,000 entities
Redis	Exact + prefix scan	Multi-session, low latency	~1M entities
Vector DB (Chroma, Pinecone)	Semantic similarity	Fuzzy entity matching, synonyms	Unlimited
Knowledge graph (Neo4j)	Relationship traversal	Complex entity relationships	~100M nodes

Entity memory degrades gracefully when extraction quality is imperfect, but it fails catastrophically when entity names are inconsistent across turns. Implement an entity normalisation step before storage: lowercase names, resolve common abbreviations (e.g. "NYC" to "New York City"), and strip titles (e.g. "Dr. Smith" to "Smith"). A simple fuzzy-match lookup at read time (Levenshtein distance threshold of 2) further reduces the impact of minor spelling variations without requiring perfect extraction.

For production multi-user systems, scope all entity reads and writes to the current user session or user ID. Entity memory that leaks across users creates a subtle privacy issue where one user's stated preferences or personal details influence another user's session. Use separate Redis key namespaces (user:{id}:entity:{name}) to enforce hard isolation at the data layer rather than relying on application-level filtering.

Entity memory systems face a fundamental tension between recall speed and completeness. Shallow entity stores retrieve quickly but miss contextual relationships; deep graph traversals capture nuance but introduce latency. Production systems often tier this: hot-path entity lookups use a fast key-value store, while background processes maintain a richer graph that gets consulted for complex reasoning chains where latency is acceptable.

One practical challenge with entity memory is entity resolution — determining that "Elon Musk", "Musk", and "the Tesla CEO" all refer to the same entity. Embedding-based similarity helps but requires careful tuning of thresholds; too low causes false merges, too high causes fragmentation. Canonical name normalization during extraction reduces this burden significantly for structured domains.

Decay and forgetting mechanisms in entity memory prevent stale information from polluting agent reasoning. Entities that have not been referenced recently can be demoted to cold storage or have their confidence scores reduced. When contradictory information arrives — a contact changes their role, a project gets renamed — versioned entity records allow the agent to distinguish the current state from historical states rather than overwriting the older information entirely.