Agents ยท Memory ยท Anthropic

Context Compaction

Anthropic's server-side summarisation system (2026) that makes agent conversations effectively unlimited โ€” no client-side context management required

Transparent
Server-Side
200+ Turns
Agent Runs
Feb 2026
API Launch

Table of Contents

SECTION 01

What Is Context Compaction?

Context Compaction is a server-side feature of the Anthropic API (launched February 2026) that allows agent conversations to continue effectively indefinitely. When a conversation approaches the model's context window limit, the API automatically summarises older turns into a compact representation, then continues the conversation using that summary plus the recent uncompressed turns.

The result is that long-running agents โ€” research agents, coding agents working on large tasks, document processing pipelines โ€” no longer need to implement context management logic client-side. Compaction happens transparently: the agent simply keeps sending messages, and the API handles the window overflow.

This is a fundamentally different approach from all prior context management techniques, because the summarisation is performed by the same model family that will use the summary. The model is optimised to preserve the information most relevant to coherent continuation of the specific task, not just a generic summary of what was said.

Why this matters for production agents: Before compaction, building a long-running agent required non-trivial engineering: detecting when the window was full, deciding what to drop, and writing a summarisation prompt that preserved task state correctly. Compaction eliminates this entire problem class for most agent use cases.
SECTION 02

How Compaction Works

The compaction process operates on the server side, invisibly to the client, using the following flow:

Long agent conversation (100+ turns) โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Approaching context limit โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Compaction API (server-side) โ”‚ โ”‚ โ”‚ โ”‚ Summarise turns 1 .. N-K โ”‚ โ”‚ Preserve: decisions, facts, โ”‚ โ”‚ constraints, current task state, โ”‚ โ”‚ tool call outcomes โ”‚ โ”‚ โ”‚ โ”‚ Output: compact summary + turns โ”‚ โ”‚ N-K+1 .. N (recent context raw) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Model continues with: โ”‚ โ”‚ [compact summary] + [recent turns] โ”‚ โ”‚ Effectively unlimited conversation โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

The key design choice is that recent turns are always kept uncompressed. Compaction summarises older history, not the most recent context. This preserves the full fidelity of what is happening right now, while compressing older context that is less immediately relevant.

Compaction can trigger multiple times over the course of a very long run, progressively compressing older summaries. The model always has access to a full semantic representation of the conversation history; only the token volume is reduced.

SECTION 03

Compaction vs. Client-Side Strategies

Prior to compaction, engineers used several client-side strategies to handle context limits in long-running agents. Each has significant limitations:

When not to use compaction: Compaction is not the right tool when you need exact verbatim preservation of specific turns โ€” for example, if your agent later needs to quote or verify exact wording from an earlier message. For those cases, maintain a separate structured log alongside the conversation and read from it explicitly.
SECTION 04

Implementation

Enabling compaction requires only a single addition to your existing API calls: include "context-compaction-2026-02" in the betas array. The rest of your agent loop is unchanged.

import anthropic client = anthropic.Anthropic() def long_running_agent(initial_task: str, tools: list, max_turns: int = 200) -> str: # Agent that uses context compaction for arbitrarily long runs. messages = [{"role": "user", "content": initial_task}] for turn in range(max_turns): # Enable context compaction with the beta header resp = client.beta.messages.create( model="claude-opus-4-5", max_tokens=4096, tools=tools, messages=messages, betas=["context-compaction-2026-02"], # <-- the only change needed ) # Optional: detect compaction events for observability if hasattr(resp, 'usage') and hasattr(resp.usage, 'compacted_tokens'): print(f" [turn {turn}] Compaction: {resp.usage.compacted_tokens} tokens summarised") messages.append({"role": "assistant", "content": resp.content}) if resp.stop_reason == "end_turn": for block in reversed(resp.content): if hasattr(block, "text"): return block.text break # Tool handling is identical to a non-compacted agent tool_results = [] for block in resp.content: if block.type == "tool_use": result = dispatch_tool(block.name, block.input) tool_results.append({ "type": "tool_result", "tool_use_id": block.id, "content": result }) if tool_results: messages.append({"role": "user", "content": tool_results}) return "Agent reached max_turns limit." def dispatch_tool(name: str, args: dict) -> str: # Your tool dispatch logic return f"[result of {name}]"

The compaction beta header can be combined with other beta features. If you are already using tool use, extended output, or other beta flags, simply add the compaction beta to the existing list.

Cost note: Compaction does consume tokens โ€” the summarisation step is a model call. For most long-running agents, this overhead is negligible compared to the cost of the full agent run. However, if you are running many short agents (under 15 turns each) that rarely hit the context limit, the compaction overhead may not be justified. Enable it selectively based on expected run length.
SECTION 05

What Compaction Preserves (and What It Doesn't)

Understanding the information-preservation guarantees of compaction is critical for designing reliable agents.

Compaction preserves (semantic level):

Compaction does not preserve (verbatim level):

The practical implication: if your agent needs to know "did I already call tool X on file Y?", do not rely on compaction to preserve that. Maintain a separate structured state dictionary that the agent explicitly updates and reads โ€” for example, a set of processed file paths that gets included in the system prompt. Compaction is for semantic continuity; structured state is for exact bookkeeping.

# Pattern: structured state + compaction # The agent maintains explicit state alongside the compacted conversation import json import anthropic client = anthropic.Anthropic() def agent_with_state(task: str, files_to_process: list) -> str: # Explicit state that compaction does not need to preserve state = { "processed_files": [], "errors": [], "findings": [] } system_prompt = f"""You are a code analysis agent. Task: {task} Current state (updated after each file): {{state_json}} Always read the current state before deciding what to do next. Update the state after completing each file.""" messages = [{"role": "user", "content": f"Process these files: {files_to_process}"}] for turn in range(100): # Inject fresh state into every system prompt current_system = system_prompt.replace("{state_json}", json.dumps(state, indent=2)) resp = client.beta.messages.create( model="claude-opus-4-5", max_tokens=4096, system=current_system, messages=messages, betas=["context-compaction-2026-02"], ) # ... handle tool calls, update state explicitly ... if resp.stop_reason == "end_turn": break return json.dumps(state["findings"], indent=2)
SECTION 06

Combining Compaction with Checkpointing

Compaction handles one failure mode of long-running agents (hitting the context window limit) but not the other (process failure, API timeout, cost overrun mid-run). For truly robust long-running agents, compaction and checkpointing address complementary problems and should both be used.

The combined pattern: checkpoint the structured state (processed files, decisions, results) after every N turns, and enable compaction to handle the context window. If the agent is interrupted, restore from the checkpoint and resume โ€” compaction will handle the new session's context growth just as it handled the previous one.

The long-running agent reliability stack: structured state (explicit bookkeeping) + compaction (context continuity) + checkpointing (fault tolerance) + max_turns limit (cost control) + escalation path (human-in-the-loop on failure). Each layer addresses a distinct failure mode. Compaction is a critical new addition to this stack, but it works best alongside the others.