Agent vs Pipeline

The core tradeoff
When to use a pipeline
When to use an agent
Hybrid architectures
Decision framework
Cost and latency comparison
Gotchas

SECTION 01

The core tradeoff

A pipeline is a fixed sequence of LLM calls and code steps that you control completely — retrieve, extract, summarise, format. Every step is predictable, fast, and cheap. A agent uses an LLM to decide which tools to call and in what order, looping until the task is complete. Agents handle open-ended tasks that don't fit a fixed sequence, but they're slower, more expensive, harder to debug, and can fail in unexpected ways.

The practical advice: start with a pipeline and only reach for agents when the task genuinely requires dynamic decision-making that you can't encode in a fixed workflow.

SECTION 02

When to use a pipeline

Use a deterministic pipeline when:

The task structure is known: Summarise a document, classify a support ticket, extract entities from an invoice. You know exactly what steps are needed.
Latency matters: Each agent loop adds 1–3 seconds. A pipeline with 3 steps completes in 3–6 seconds; an equivalent agent might take 15–30 seconds with 5 loops.
Cost control is critical: Agents burn unpredictable numbers of tokens. A pipeline has deterministic cost per request.
Debugging is important: Pipeline failures have clear stack traces and obvious failure points. Agent failures are subtle — the LLM made a bad decision at step 4 of 9.
Reliability is required: Pipelines can be tested with unit tests and monitored with straightforward metrics. Agents are stochastic — the same input can produce different paths on different runs.

SECTION 03

When to use an agent

Use an agent when the task genuinely requires dynamic decision-making:

Unknown number of steps: "Research this company and write a report" — you don't know how many searches, reads, or summaries are needed upfront.
Conditional branching is complex: The next step depends on what was found in the previous step, in ways too varied to hardcode.
Tool selection is non-trivial: The agent must choose between a calculator, a search engine, a code interpreter, and a database — and the right choice depends on the query.
Human-in-the-loop flexibility: The task involves back-and-forth with a user or approval gates that can't be predicted.

Rule of thumb: if you can write a flowchart for the task in 15 minutes, use a pipeline. If the flowchart has too many branches to draw, consider an agent.

SECTION 04

Hybrid architectures

Most production systems are hybrids: an agent orchestrates high-level decisions while pipelines handle well-defined sub-tasks. Examples:

Agent + pipeline tools: The agent decides which pipeline to invoke (summarisation pipeline, extraction pipeline, search pipeline) and passes results between them.
Pipeline with one LLM-routed step: A mostly deterministic pipeline with a single "if/else" step where an LLM classifies the input to choose between two fixed paths.
Planner-executor pattern: A planning LLM call generates a structured plan (a list of steps); a deterministic executor runs each step. Combines agent flexibility with pipeline reliability.

SECTION 05

Decision framework

def choose_architecture(task: str) -> str:
    # Ask these questions in order:
    questions = [
        ("Can you write all steps in a fixed flowchart?", "pipeline"),
        ("Is latency < 5s required?", "pipeline"),
        ("Is cost/request bounded critical?", "pipeline"),
        ("Does success rate need to be > 95%?", "pipeline"),
        ("Does the task require open-ended web research?", "agent"),
        ("Does the task require tool selection from 5+ options?", "agent"),
        ("Does task length vary wildly (1-50 steps)?", "agent"),
    ]
    # If most answers point to "pipeline", use a pipeline.
    # Only if the last 3 conditions are true should you reach for an agent.
    ...

# Practical decision tree:
# 1. Can I write a fixed DAG? -> YES -> Pipeline
# 2. Is this customer-facing (latency/reliability matters)? -> YES -> Pipeline
# 3. Is this internal / research task with flexible budget? -> Maybe agent
# 4. Does the task have genuinely open-ended structure? -> YES -> Agent

SECTION 06

Cost and latency comparison

For a "research and summarise" task:

Pipeline (fixed 3 steps): 3 × gpt-4o-mini calls (~$0.001 each) = ~$0.003, ~3s total
Agent (variable 5-10 loops): 7 × gpt-4o calls (~$0.01 each) = ~$0.07, ~15s total

For a simple extraction task where a pipeline works perfectly, using an agent is ~23× more expensive and ~5× slower. Over 1M requests/month, that's the difference between $3k and $70k in LLM costs. Use agents only where their dynamic capability is actually needed.

SECTION 07

Gotchas

Agent scope creep: Agents that can call tools without bounds will sometimes take surprisingly long paths. Always set a maximum number of iterations/tool calls and a budget cap.
Pipeline rigidity: Pipelines fail silently on edge cases — the 1% of inputs that don't fit the assumed structure. Build fallback handlers and log pipeline failures for review.
Testing agents: Unit tests on deterministic pipelines are straightforward. Agent testing requires evals on diverse inputs with expected outcomes, not exact output matching. Use pass/fail criteria like "did the agent complete the task?" rather than exact string matching.

Implementation patterns for each approach

Pipeline implementations use deterministic control flow — if/else branching, for loops, function composition — to sequence LLM calls and tool invocations. The predictable execution graph makes pipelines easy to test, debug, and optimize: each step can be unit-tested independently, failures are isolated to specific steps, and latency is predictable because the execution path is fixed. Agent implementations use the LLM itself to determine what to do next, creating dynamic execution graphs that adapt to task requirements but are harder to test comprehensively because all execution paths cannot be enumerated.

Dimension	Pipeline	Agent
Control flow	Deterministic (code)	Dynamic (LLM-decided)
Testability	High (unit-testable steps)	Low (non-deterministic paths)
Latency	Predictable	Variable (unknown steps)
Cost	Fixed per run	Variable (depends on task)
Failure modes	Localized, recoverable	Cascading, harder to diagnose

from openai import OpenAI

client = OpenAI()

# Pipeline approach: deterministic steps
def pipeline_summarize_and_translate(text: str, target_lang: str) -> str:
    # Step 1: summarize
    summary = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": f"Summarize in 3 sentences:
{text}"}]
    ).choices[0].message.content

    # Step 2: translate (always runs, predictable cost)
    translated = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": f"Translate to {target_lang}:
{summary}"}]
    ).choices[0].message.content

    return translated

# vs Agent approach: LLM decides whether translation is needed
# (more flexible but unpredictable cost and steps)

Monitoring and observability requirements differ substantially between pipelines and agents. Pipeline monitoring logs fixed-step execution traces — step name, input, output, latency, errors — that map naturally to structured logging and distributed tracing systems. Agent monitoring must capture dynamic execution traces of variable length, track tool call sequences that were not predetermined, and identify when agents loop or make unproductive sequences of calls. Specialized agent tracing tools like LangSmith, Langfuse, and Arize Phoenix are designed for this dynamic trace structure, providing visualizations and analysis that general-purpose APM tools don't support.

Cost control is fundamentally different between pipelines and agents. Pipeline costs are deterministic and predictable — the cost per request equals the sum of token costs for each fixed step. Agent costs are variable and potentially unbounded — a misbehaving agent can make hundreds of tool calls before giving up, incurring costs orders of magnitude higher than expected. Budget limits (maximum LLM calls, maximum token spend per request) are essential safety mechanisms for agent deployments, as are circuit breakers that terminate agent loops when no progress is detected after a configurable number of steps.

The decision between pipeline and agent architecture frequently comes down to the variance in the input task structure. If all tasks have the same structure and require the same steps, a pipeline is always more appropriate — it is cheaper, faster, more reliable, and easier to maintain than an agent performing the same deterministic sequence. Agents add value only when task structure genuinely varies and the correct sequence of steps cannot be determined without reasoning about the specific task. Many applications that are initially built as agents can be refactored as pipelines once the actual distribution of task types is understood from production data.

Pipeline composition using function chaining or dataclass-based message passing provides explicit data flow visibility that agents lack. In a pipeline, the output type of each step is the input type of the next step — type annotations and runtime validation can enforce this contract, catching mismatches at development time rather than in production. This strongly-typed data flow is one of the most undervalued advantages of pipelines over agents: the compiler or type checker can verify the entire pipeline's structure before any LLM calls are made, enabling safe refactoring and early error detection that agentic architectures with dynamic tool selection cannot provide.

Hybrid architectures that use pipelines for the predictable outer structure and agents for bounded sub-tasks within that structure often provide the best practical tradeoff. For example, a document analysis pipeline might have fixed steps for document ingestion, chunking, and output formatting, but delegate the core analysis to an agent with a limited set of domain-specific tools. This structure preserves pipeline predictability for the high-level flow while allowing agent flexibility for the analysis step where the reasoning path genuinely varies. Bounding agent execution (max steps, specific tool set, structured output schema) within a pipeline context is the architectural pattern that most often succeeds in production LLM applications.