Agent pauses at defined checkpoints for human review or approval before proceeding — essential for high-stakes workflows where errors are expensive to reverse.
Fully autonomous agents are powerful but dangerous: a wrong decision early in a workflow can cascade into expensive, hard-to-reverse consequences. An agent that sends emails, modifies databases, or executes financial transactions without review can cause real-world harm from a single hallucination.
Human-in-the-Loop (HITL) is the pragmatic middle ground: the agent automates everything it can handle reliably, and pauses for human review at specific high-risk steps. The human doesn't need to watch every action — just the ones where errors are expensive.
Think of it like a junior employee: you don't watch them format spreadsheets, but you do review the client-facing report before it goes out. HITL gives agents the same operating model: autonomous for safe actions, supervised for consequential ones.
Three patterns for deciding when to pause:
Fixed checkpoints: always pause at specific steps. "Always show the user the draft email before sending." Simple, predictable, but can be annoying for simple cases.
Risk-based checkpoints: pause when the action meets risk criteria. "Pause if the email is going to more than 10 recipients, or if it contains a financial commitment." More nuanced, but requires defining risk criteria upfront.
Confidence-based checkpoints: pause when the model's confidence is low. If the agent can express uncertainty ("I'm not sure whether to interpret this as X or Y"), surface that ambiguity to the user rather than guessing. Requires the agent to be calibrated about its own uncertainty — difficult but valuable.
import anthropic
from typing import Literal
client = anthropic.Anthropic()
def requires_approval(action: str, args: dict) -> bool:
'''Define which actions need human approval.'''
risky_actions = {"send_email", "delete_record", "execute_payment", "publish_post"}
if action in risky_actions:
return True
if action == "send_email" and args.get("recipients", []) and len(args["recipients"]) > 5:
return True
return False
def get_human_approval(action: str, args: dict) -> Literal["approve", "reject", "modify"]:
'''In production: send to a queue, webhook, or UI. Here: stdin.'''
print(f"
⚠️ APPROVAL REQUIRED")
print(f"Action: {action}")
print(f"Arguments: {args}")
choice = input("Approve [a], Reject [r], or Modify [m]? ").strip().lower()
return {"a": "approve", "r": "reject", "m": "modify"}.get(choice, "reject")
def execute_with_hitl(action: str, args: dict) -> str:
if requires_approval(action, args):
decision = get_human_approval(action, args)
if decision == "reject":
return f"Action '{action}' rejected by user."
elif decision == "modify":
# In production: let user edit args via UI
new_args = input("Enter modified args as JSON: ")
import json
args = json.loads(new_args)
# Fall through to execute if approved or modified
return dispatch_tool(action, args)
def dispatch_tool(action: str, args: dict) -> str:
'''Execute the actual tool.'''
print(f"Executing {action} with {args}")
return f"Completed: {action}"
LangGraph has first-class support for HITL via interrupt_before and interrupt_after. The graph state is persisted to a checkpointer (SQLite, Postgres, Redis) so the agent can be paused indefinitely and resumed after human input:
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from typing import TypedDict, Annotated
import operator
class AgentState(TypedDict):
messages: Annotated[list, operator.add]
draft: str
approved: bool
def draft_node(state: AgentState) -> AgentState:
# Generate a draft
return {"draft": "Dear client, I'm writing to confirm..."}
def send_node(state: AgentState) -> AgentState:
# Only reached after human approval
print(f"Sending: {state['draft']}")
return {"messages": [{"role": "system", "content": "Email sent."}]}
# Build graph
builder = StateGraph(AgentState)
builder.add_node("draft", draft_node)
builder.add_node("send", send_node)
builder.add_edge("draft", "send")
builder.add_edge("send", END)
builder.set_entry_point("draft")
# Add checkpointer + interrupt before the send node
checkpointer = MemorySaver()
graph = builder.compile(
checkpointer=checkpointer,
interrupt_before=["send"] # pause before sending
)
# Run until interrupt
config = {"configurable": {"thread_id": "email-001"}}
state = graph.invoke({"messages": [], "draft": "", "approved": False}, config)
print(f"Draft ready for review: {graph.get_state(config).values['draft']}")
# Human reviews and resumes
print("Human approved. Resuming...")
final_state = graph.invoke(None, config) # resume from checkpoint
For production agents, synchronous blocking ("wait for user input in a while loop") doesn't scale. Use an async queue pattern:
import asyncio, uuid
from datetime import datetime
# In production: use Redis, Celery, or a webhook service
pending_approvals: dict[str, dict] = {}
approval_responses: dict[str, str] = {}
async def request_approval(action: str, args: dict) -> str:
'''Send approval request and wait for response.'''
approval_id = str(uuid.uuid4())
pending_approvals[approval_id] = {
"action": action,
"args": args,
"requested_at": datetime.now().isoformat()
}
# In production: send webhook/email/Slack message to approver
print(f"Approval requested: {approval_id}")
# Poll for response (in production: use event/callback)
while approval_id not in approval_responses:
await asyncio.sleep(1)
return approval_responses.pop(approval_id)
def submit_approval(approval_id: str, decision: str):
'''Called by the human (via API, UI, Slack button, etc.)'''
approval_responses[approval_id] = decision
async def agent_with_async_hitl():
# Run agent until checkpoint
draft = "Draft email content..."
decision = await request_approval("send_email", {"content": draft})
if decision == "approved":
print("Sending email...")
else:
print("Email cancelled.")
The right level of human involvement depends on the reversibility and impact of the action:
Fully automated (no approval needed): reading data, formatting, internal calculations, drafting (not sending), searching. Low risk, easily reversible.
Soft checkpoint (show result, proceed unless rejected): summaries, analysis reports, recommendations. The user can stop it, but the default is proceed.
Hard checkpoint (wait for explicit approval): sending messages, modifying records, financial actions, public posts. The agent waits indefinitely until approved or rejected.
Always manual: deleting data permanently, accessing sensitive credentials, executing large financial transactions. Even with approval, give extra friction ("type CONFIRM to proceed").
Start conservative (many checkpoints) and gradually automate as you build confidence in the agent's judgment for each action type.
Approval fatigue degrades oversight. If you require approval for every action, users start rubber-stamping without reading. Too many approvals = no effective oversight. Reserve hard checkpoints for genuinely consequential actions; automate the routine ones.
Timeouts need explicit handling. What happens if the human doesn't respond in 10 minutes? 24 hours? Define a timeout policy for each checkpoint: auto-reject (safest), escalate to another approver, or auto-approve (only for low-risk actions). Never leave the agent blocked indefinitely without a timeout.
Context must be shown to the approver. "Approve sending an email?" is not enough. Show the full email, the recipient list, and the context of why the agent is sending it. Approvals without context are meaningless and unsafe.
LangGraph state must be serialisable. When using LangGraph checkpointing, all state must be JSON-serialisable (no custom objects, no file handles). Design your state schema with serialisation in mind from the start.
| Checkpoint Type | When to Use | Timeout Handling | UX Pattern |
|---|---|---|---|
| Hard gate (blocking) | Before irreversible actions | Auto-cancel after TTL | Approval button in UI |
| Soft gate (non-blocking) | Quality review of draft output | Auto-approve after TTL | Editable preview panel |
| Asynchronous review | Low-urgency long-running tasks | Queue for human review | Task inbox with accept/reject |
| Inline clarification | Ambiguous inputs mid-task | Proceed with best-guess after TTL | Chat message asking question |
Design HITL checkpoints around business risk, not technical uncertainty. Agents should not ask for human approval on every step — that defeats the purpose of automation. Reserve hard gates for actions that are expensive to reverse (sending external communications, writing to production databases, making financial transactions) or that have regulatory requirements for human sign-off. For all other actions, use soft gates or asynchronous review with generous timeouts that allow the pipeline to proceed if the reviewer is unavailable.
Track checkpoint approval rates in production. If a checkpoint is approved more than 95% of the time without modification, it is likely too conservative and candidates for removal or relaxation. If a checkpoint is modified on more than 30% of reviews, the upstream agent logic needs improvement. Use these metrics to continuously tune your HITL strategy rather than treating the initial design as permanent.
The placement of human checkpoints in an agentic pipeline is as important as their existence. Checkpoints too early in a workflow interrupt frequently on low-stakes decisions; checkpoints too late catch errors only after significant irreversible work has been done. Effective HITL design maps the risk profile of each action — reversible low-stakes actions skip review, irreversible or high-impact actions always pause for confirmation.
Asynchronous HITL patterns use message queues to decouple the agent from the human reviewer. The agent publishes a review request, pauses execution, and resumes when the queue delivers an approval or rejection. This allows the human reviewer to operate at their own pace without blocking the entire system, and supports audit logging of every decision point for compliance purposes.
Escalation policies define what happens when a human reviewer is unavailable or does not respond within a timeout window. Common strategies include: defaulting to the conservative action (do nothing and surface the issue), escalating to a senior reviewer, or allowing the agent to proceed with reduced confidence and flagging the action for post-hoc audit. The choice depends on whether the cost of delay exceeds the cost of the risk incurred by acting without review.