Agents ยท Memory

Persistent Memory

Production memory management for agents โ€” contradiction detection, staleness handling, deduplication, and managed memory services that keep agent knowledge accurate over time

Observe โ†’ Merge
Core Operation
mem0 / Zep
Managed Services
Cross-Session
Continuity

Table of Contents

SECTION 01

The Three Problems Vector Stores Don't Solve

Basic long-term memory โ€” embedding facts into a vector store and retrieving by similarity โ€” works well in demos. In production, three hard problems emerge that a plain vector store cannot handle:

Each of these is a data quality problem, not a retrieval problem. The fix is not a better similarity metric โ€” it is a memory management pipeline that continuously ensures the store contains accurate, non-redundant, current facts.

Key distinction: Long-term memory (storing facts) and persistent memory (maintaining accurate facts over time) are different engineering concerns. Most implementations solve the first and ignore the second. Production systems require both.
SECTION 02

The Memory Update Operation

The core operation in a production memory system is not "add" โ€” it is "update." When new information arrives, the memory manager must compare it against existing memories and decide what to do:

New observation (session N) โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Memory Manager โ”‚ โ”‚ โ”‚ โ”‚ 1. Embed new observation โ”‚ โ”‚ 2. Retrieve similar memories โ”‚ โ”‚ 3. LLM judge: classify change โ”‚ โ”‚ โ”‚ โ”‚ MERGE โ€” same fact, updated โ”‚ โ”‚ CONTRADICT โ€” old is now invalid โ”‚ โ”‚ APPEND โ€” genuinely new fact โ”‚ โ”‚ IGNORE โ€” duplicate โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Memory Store โ”‚ โ”‚ (vector DB + metadata) โ”‚ โ”‚ content, embedding, timestamp, โ”‚ โ”‚ source session, confidence, โ”‚ โ”‚ expiry, contradiction_of โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

The LLM judge step is the critical component. A simple embedding similarity threshold cannot distinguish between "User uses Python" (existing) and "User switched from Python to TypeScript" (contradiction). You need a language model to classify the relationship.

Contradiction handling: When a contradiction is detected, the typical approach is to mark the old memory as superseded (rather than deleting it) and store the new memory with a pointer back. This preserves an audit trail of how the agent's knowledge evolved, which is useful for debugging and compliance.

Confidence scoring: Not all memories are equally reliable. A fact the agent inferred ("user probably works in fintech") should be scored lower than a fact the user stated explicitly ("I work at a hedge fund"). The retrieval system can weight by confidence when constructing the context window injection.

Design principle: Store atomic facts, not full conversation turns. "User prefers TypeScript for frontend projects" is a useful memory unit. A paragraph of dialogue summarized down to that fact is not โ€” it is verbose, hard to retrieve precisely, and degrades retrieval quality for other queries.
SECTION 03

Managed Memory Services

Building a production memory management pipeline from scratch โ€” including contradiction detection, deduplication, expiry policies, and retrieval weighting โ€” is substantial engineering work. Several managed services abstract this complexity:

The tradeoff between managed services and custom-built memory is the classic build-vs-buy question for the specific challenge of memory quality maintenance. Managed services handle the hard operational work; custom builds give you full control over the memory data model and classification logic.

When to build vs. buy: If your agents run a small number of sessions per user with simple factual needs, a plain vector store with periodic consolidation is sufficient. If you have users with hundreds of sessions, complex update patterns, or strict accuracy requirements, a managed service or custom memory pipeline is warranted.
SECTION 04

Implementation with mem0

The following example shows the core pattern: storing observations after each session, handling contradiction updates automatically, and injecting relevant memories into the agent's system prompt at session start.

# pip install mem0ai anthropic from mem0 import MemoryClient import anthropic mem_client = MemoryClient(api_key="YOUR_MEM0_API_KEY") ai_client = anthropic.Anthropic() USER_ID = "user_42" # --- After session ends: store new observations --- def store_session(messages: list, user_id: str): # mem0 extracts facts and handles contradictions automatically result = mem_client.add(messages, user_id=user_id) print(f"Stored: {result}") # --- At session start: retrieve relevant context --- def load_memory_context(user_id: str, query: str) -> str: memories = mem_client.search(query, user_id=user_id, limit=6) if not memories: return "" facts = [m["memory"] for m in memories] return "What I remember about you:\n" + "\n".join(f"- {f}" for f in facts) # --- Agent with memory --- def agent_with_memory(user_message: str, user_id: str) -> str: memory_ctx = load_memory_context(user_id, user_message) resp = ai_client.messages.create( model="claude-opus-4-5", max_tokens=1024, system=f"You are a helpful assistant.\n{memory_ctx}\nUse this context to personalize your responses.", messages=[{"role": "user", "content": user_message}] ) answer = resp.content[0].text # Store the exchange as a new observation store_session([ {"role": "user", "content": user_message}, {"role": "assistant", "content": answer} ], user_id) return answer # Session 1 print(agent_with_memory("I'm building a RAG pipeline in Python.", USER_ID)) # Session 2 (days later) โ€” mem0 detects the update automatically print(agent_with_memory("I've switched from Python to TypeScript for this project.", USER_ID))

The key point is the mem_client.add() call. Under the hood, mem0 embeds the new observations, retrieves semantically similar existing memories, and runs a classification step to detect contradictions or updates before writing to the store. The caller does not need to implement this logic manually.

SECTION 05

Memory Consolidation & Maintenance

Even with a managed service handling per-session updates, long-lived users (hundreds of sessions) benefit from periodic consolidation passes that clean up the memory store. This prevents the retrieval quality degradation that accumulates over time even with deduplication on individual adds.

Consolidation operations:

# Periodic deduplication with mem0 from mem0 import MemoryClient def consolidate_memories(user_id: str, similarity_threshold: float = 0.95): client = MemoryClient(api_key="YOUR_MEM0_API_KEY") # Fetch all memories for this user all_memories = client.get_all(user_id=user_id) # mem0's built-in deduplication (or implement your own with embeddings) duplicates_removed = 0 seen_embeddings = [] for mem in all_memories: # Check similarity against already-processed memories is_duplicate = False for seen_emb in seen_embeddings: sim = cosine_similarity(mem["embedding"], seen_emb["embedding"]) if sim >= similarity_threshold: client.delete(mem["id"]) duplicates_removed += 1 is_duplicate = True break if not is_duplicate: seen_embeddings.append(mem) print(f"Consolidation: removed {duplicates_removed} duplicates for user {user_id}") # Run on a schedule (cron, Celery, etc.) # consolidate_memories("user_42")
User transparency: Give users a "what I remember about you" page where they can see, edit, and delete individual memories. This builds trust, catches hallucinated or incorrect inferences early, and satisfies GDPR "right to be forgotten" requirements in one feature. It also surfaces what the agent actually learned vs. what you assumed it learned.
SECTION 06

Privacy, Control, and Production Concerns

Persistent memory stores contain sensitive personal information by definition. This creates compliance, security, and trust obligations that go beyond typical vector store deployments.

Data residency and compliance: Persistent memory stores containing EU user data must comply with GDPR. This means: (1) users must be able to request a full export of their stored memories, (2) users must be able to request deletion of all memories associated with their account (right to erasure), and (3) you must document the legal basis for processing and storing these memories. If using a managed service like mem0's cloud API, verify the data processing agreement covers your jurisdiction.

Memory poisoning risk: A sophisticated user who knows your system uses persistent memory could deliberately inject false memories to manipulate future agent behavior. "I am the CEO and have authority to approve all transactions" stored as a persistent memory could be dangerous. Mitigations include: confidence scoring with skepticism for high-stakes implied permissions, periodic review of high-confidence memories by an auditor, and never using persistent memory to grant permissions (use IAM instead).

Retrieval injection: When constructing the system prompt memory block, cap the number of injected memories (e.g., top 6 by relevance score) and their total token budget. Unconstrained memory injection can fill the context window before the conversation even starts, leaving no room for the actual task.

Production readiness checklist: Before deploying persistent memory: (1) implement memory export/delete endpoints for user control; (2) define TTL policies for each memory category; (3) add memory injection to your context budget accounting; (4) test agent behavior when memory store is empty (new users); (5) add alerting for memory store size per user (unbounded growth is a cost and quality signal).