Multi-Agent

Human-in-the-Loop

Agent pauses at defined checkpoints for human review or approval before proceeding — essential for high-stakes workflows where errors are expensive to reverse.

Approval
Checkpoints
Risk
Mitigation
LangGraph
Native support

Table of Contents

SECTION 01

Why agents need humans in the loop

Fully autonomous agents are powerful but dangerous: a wrong decision early in a workflow can cascade into expensive, hard-to-reverse consequences. An agent that sends emails, modifies databases, or executes financial transactions without review can cause real-world harm from a single hallucination.

Human-in-the-Loop (HITL) is the pragmatic middle ground: the agent automates everything it can handle reliably, and pauses for human review at specific high-risk steps. The human doesn't need to watch every action — just the ones where errors are expensive.

Think of it like a junior employee: you don't watch them format spreadsheets, but you do review the client-facing report before it goes out. HITL gives agents the same operating model: autonomous for safe actions, supervised for consequential ones.

SECTION 02

Checkpoint strategies

Three patterns for deciding when to pause:

Fixed checkpoints: always pause at specific steps. "Always show the user the draft email before sending." Simple, predictable, but can be annoying for simple cases.

Risk-based checkpoints: pause when the action meets risk criteria. "Pause if the email is going to more than 10 recipients, or if it contains a financial commitment." More nuanced, but requires defining risk criteria upfront.

Confidence-based checkpoints: pause when the model's confidence is low. If the agent can express uncertainty ("I'm not sure whether to interpret this as X or Y"), surface that ambiguity to the user rather than guessing. Requires the agent to be calibrated about its own uncertainty — difficult but valuable.

SECTION 03

Implementing HITL with interrupts

import anthropic
from typing import Literal

client = anthropic.Anthropic()

def requires_approval(action: str, args: dict) -> bool:
    '''Define which actions need human approval.'''
    risky_actions = {"send_email", "delete_record", "execute_payment", "publish_post"}
    if action in risky_actions:
        return True
    if action == "send_email" and args.get("recipients", []) and len(args["recipients"]) > 5:
        return True
    return False

def get_human_approval(action: str, args: dict) -> Literal["approve", "reject", "modify"]:
    '''In production: send to a queue, webhook, or UI. Here: stdin.'''
    print(f"
⚠️  APPROVAL REQUIRED")
    print(f"Action: {action}")
    print(f"Arguments: {args}")
    choice = input("Approve [a], Reject [r], or Modify [m]? ").strip().lower()
    return {"a": "approve", "r": "reject", "m": "modify"}.get(choice, "reject")

def execute_with_hitl(action: str, args: dict) -> str:
    if requires_approval(action, args):
        decision = get_human_approval(action, args)
        if decision == "reject":
            return f"Action '{action}' rejected by user."
        elif decision == "modify":
            # In production: let user edit args via UI
            new_args = input("Enter modified args as JSON: ")
            import json
            args = json.loads(new_args)
        # Fall through to execute if approved or modified
    return dispatch_tool(action, args)

def dispatch_tool(action: str, args: dict) -> str:
    '''Execute the actual tool.'''
    print(f"Executing {action} with {args}")
    return f"Completed: {action}"
SECTION 04

LangGraph checkpointing

LangGraph has first-class support for HITL via interrupt_before and interrupt_after. The graph state is persisted to a checkpointer (SQLite, Postgres, Redis) so the agent can be paused indefinitely and resumed after human input:

from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from typing import TypedDict, Annotated
import operator

class AgentState(TypedDict):
    messages: Annotated[list, operator.add]
    draft: str
    approved: bool

def draft_node(state: AgentState) -> AgentState:
    # Generate a draft
    return {"draft": "Dear client, I'm writing to confirm..."}

def send_node(state: AgentState) -> AgentState:
    # Only reached after human approval
    print(f"Sending: {state['draft']}")
    return {"messages": [{"role": "system", "content": "Email sent."}]}

# Build graph
builder = StateGraph(AgentState)
builder.add_node("draft", draft_node)
builder.add_node("send", send_node)
builder.add_edge("draft", "send")
builder.add_edge("send", END)
builder.set_entry_point("draft")

# Add checkpointer + interrupt before the send node
checkpointer = MemorySaver()
graph = builder.compile(
    checkpointer=checkpointer,
    interrupt_before=["send"]  # pause before sending
)

# Run until interrupt
config = {"configurable": {"thread_id": "email-001"}}
state = graph.invoke({"messages": [], "draft": "", "approved": False}, config)
print(f"Draft ready for review: {graph.get_state(config).values['draft']}")

# Human reviews and resumes
print("Human approved. Resuming...")
final_state = graph.invoke(None, config)  # resume from checkpoint
SECTION 05

Async HITL for long-running agents

For production agents, synchronous blocking ("wait for user input in a while loop") doesn't scale. Use an async queue pattern:

import asyncio, uuid
from datetime import datetime

# In production: use Redis, Celery, or a webhook service
pending_approvals: dict[str, dict] = {}
approval_responses: dict[str, str] = {}

async def request_approval(action: str, args: dict) -> str:
    '''Send approval request and wait for response.'''
    approval_id = str(uuid.uuid4())
    pending_approvals[approval_id] = {
        "action": action,
        "args": args,
        "requested_at": datetime.now().isoformat()
    }
    # In production: send webhook/email/Slack message to approver
    print(f"Approval requested: {approval_id}")
    # Poll for response (in production: use event/callback)
    while approval_id not in approval_responses:
        await asyncio.sleep(1)
    return approval_responses.pop(approval_id)

def submit_approval(approval_id: str, decision: str):
    '''Called by the human (via API, UI, Slack button, etc.)'''
    approval_responses[approval_id] = decision

async def agent_with_async_hitl():
    # Run agent until checkpoint
    draft = "Draft email content..."
    decision = await request_approval("send_email", {"content": draft})
    if decision == "approved":
        print("Sending email...")
    else:
        print("Email cancelled.")
SECTION 06

Calibrating automation level

The right level of human involvement depends on the reversibility and impact of the action:

Fully automated (no approval needed): reading data, formatting, internal calculations, drafting (not sending), searching. Low risk, easily reversible.

Soft checkpoint (show result, proceed unless rejected): summaries, analysis reports, recommendations. The user can stop it, but the default is proceed.

Hard checkpoint (wait for explicit approval): sending messages, modifying records, financial actions, public posts. The agent waits indefinitely until approved or rejected.

Always manual: deleting data permanently, accessing sensitive credentials, executing large financial transactions. Even with approval, give extra friction ("type CONFIRM to proceed").

Start conservative (many checkpoints) and gradually automate as you build confidence in the agent's judgment for each action type.

SECTION 07

Gotchas

Approval fatigue degrades oversight. If you require approval for every action, users start rubber-stamping without reading. Too many approvals = no effective oversight. Reserve hard checkpoints for genuinely consequential actions; automate the routine ones.

Timeouts need explicit handling. What happens if the human doesn't respond in 10 minutes? 24 hours? Define a timeout policy for each checkpoint: auto-reject (safest), escalate to another approver, or auto-approve (only for low-risk actions). Never leave the agent blocked indefinitely without a timeout.

Context must be shown to the approver. "Approve sending an email?" is not enough. Show the full email, the recipient list, and the context of why the agent is sending it. Approvals without context are meaningless and unsafe.

LangGraph state must be serialisable. When using LangGraph checkpointing, all state must be JSON-serialisable (no custom objects, no file handles). Design your state schema with serialisation in mind from the start.

SECTION 08

HITL Checkpoint Design Reference

Checkpoint TypeWhen to UseTimeout HandlingUX Pattern
Hard gate (blocking)Before irreversible actionsAuto-cancel after TTLApproval button in UI
Soft gate (non-blocking)Quality review of draft outputAuto-approve after TTLEditable preview panel
Asynchronous reviewLow-urgency long-running tasksQueue for human reviewTask inbox with accept/reject
Inline clarificationAmbiguous inputs mid-taskProceed with best-guess after TTLChat message asking question

Design HITL checkpoints around business risk, not technical uncertainty. Agents should not ask for human approval on every step — that defeats the purpose of automation. Reserve hard gates for actions that are expensive to reverse (sending external communications, writing to production databases, making financial transactions) or that have regulatory requirements for human sign-off. For all other actions, use soft gates or asynchronous review with generous timeouts that allow the pipeline to proceed if the reviewer is unavailable.

Track checkpoint approval rates in production. If a checkpoint is approved more than 95% of the time without modification, it is likely too conservative and candidates for removal or relaxation. If a checkpoint is modified on more than 30% of reviews, the upstream agent logic needs improvement. Use these metrics to continuously tune your HITL strategy rather than treating the initial design as permanent.

The placement of human checkpoints in an agentic pipeline is as important as their existence. Checkpoints too early in a workflow interrupt frequently on low-stakes decisions; checkpoints too late catch errors only after significant irreversible work has been done. Effective HITL design maps the risk profile of each action — reversible low-stakes actions skip review, irreversible or high-impact actions always pause for confirmation.

Asynchronous HITL patterns use message queues to decouple the agent from the human reviewer. The agent publishes a review request, pauses execution, and resumes when the queue delivers an approval or rejection. This allows the human reviewer to operate at their own pace without blocking the entire system, and supports audit logging of every decision point for compliance purposes.

Escalation policies define what happens when a human reviewer is unavailable or does not respond within a timeout window. Common strategies include: defaulting to the conservative action (do nothing and surface the issue), escalating to a senior reviewer, or allowing the agent to proceed with reduced confidence and flagging the action for post-hoc audit. The choice depends on whether the cost of delay exceeds the cost of the risk incurred by acting without review.