RAG ยท Agents ยท Research

Agentic Search

Search agents that iteratively plan queries, evaluate results, and reformulate until the information need is fully satisfied

Iterative
Retrieval
Deep Research
Application
Multi-Source
Synthesis

Table of Contents

SECTION 01

What Is Agentic Search?

Agentic Search turns a single-shot query into a multi-step information gathering process. A search agent iteratively formulates queries, evaluates the results, identifies information gaps, and reformulates โ€” continuing until the information need is fully satisfied or a predefined confidence threshold is met.

The contrast with standard search is stark. A standard search engine (or RAG pipeline) takes one query, returns one result set, and is done. An agentic search agent might issue 5โ€“20 queries across different sources, follow up on promising leads, compare conflicting information, and synthesize a comprehensive answer that no single search result contains.

The commercial manifestation of this is "deep research" products: Perplexity Deep Research, OpenAI Deep Research, and similar tools that spend minutes (not milliseconds) on a research question to produce a thorough, cited report. These are agentic search systems.

Primary use cases:

Vs. Agentic RAG: Agentic RAG applies iterative retrieval over a private, indexed knowledge base. Agentic Search typically operates over the open web (using a search API) but the distinction blurs when both patterns are combined.
SECTION 02

The Search Loop

The core loop of an agentic search agent has five components that cycle until completion:

Research question โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Search Agent (LLM) โ”‚ โ”‚ โ”‚ โ”‚ 1. Plan: what do I need to find? โ”‚ โ”‚ Decompose into sub-questions โ”‚ โ”‚ โ”‚ โ”‚ 2. Search: issue targeted query โ”‚ โ”‚ โ”‚ โ”‚ 3. Evaluate: are results relevant? โ”‚ โ”‚ sufficient? contradictory? โ”‚ โ”‚ โ”‚ โ”‚ 4. Decide: done? or need more? โ”‚ โ”‚ If more: reformulate query โ”‚ โ”‚ โ”‚ โ”‚ 5. Synthesize: write final answer โ”‚ โ”‚ with citations โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Search Tools โ”‚ โ”‚ web_search(query) โ”‚ โ”‚ read_url(url) โ†’ full text โ”‚ โ”‚ news_search(query, date_range) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

The loop terminates when the agent's confidence is high enough, the search budget is exhausted, or the agent explicitly signals it has gathered sufficient evidence.

Plan before you search: Agents that plan sub-questions upfront consistently outperform agents that search reactively. Spend one LLM call decomposing the question into 3โ€“5 sub-questions before issuing any searches. This focuses the search and prevents circular querying.
SECTION 03

Search Tools

The tools available to the agent determine the quality ceiling of its research. More diverse tools = broader coverage.

Core tools:

Specialized tools for domain research:

# Tavily search integration (optimized for LLM agents) # pip install tavily-python from tavily import TavilyClient tavily = TavilyClient(api_key="YOUR_KEY") def web_search(query: str, max_results: int = 5) -> str: results = tavily.search( query=query, max_results=max_results, include_raw_content=False, # Set True to get full page text search_depth="advanced" # "basic" is faster/cheaper ) formatted = [] for r in results.get("results", []): formatted.append(f"[{r['url']}]\n{r['title']}\n{r['content']}\n") return "\n---\n".join(formatted) if formatted else "No results found."
SECTION 04

Stopping Criteria

One of the most important design decisions in an agentic search system: when does the agent stop? An agent that stops too early gives incomplete answers. One that never stops exhausts the budget and user patience.

Explicit stopping signals: The cleanest approach is to give the agent a finish(summary) tool that it calls when it decides it has sufficient information. Combine with a max_steps hard limit as a safety net.

Coverage-based stopping: After each round, ask the agent to list remaining information gaps. When the gap list is empty, stop. This is more reliable than asking "are you done?" which LLMs tend to answer affirmatively.

Confidence scoring: Ask the agent to rate its confidence (0โ€“1) in its current answer at each step. Stop when confidence exceeds a threshold (e.g. 0.85) or when additional search rounds don't increase confidence.

Diminishing returns detection: Track whether each search round added new, non-redundant information. If two consecutive rounds return only information already in the context, stop.

# Stopping with a finish tool TOOLS = [ { "name": "web_search", "description": "Search the web for information.", "input_schema": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]} }, { "name": "finish", "description": "Call this when you have gathered sufficient information to answer the research question comprehensively. Pass your final synthesized answer.", "input_schema": { "type": "object", "properties": { "answer": {"type": "string", "description": "Complete, cited research synthesis"}, "confidence": {"type": "number", "description": "Confidence 0.0-1.0 in completeness"} }, "required": ["answer"] } } ]
SECTION 05

Implementation

A complete agentic search agent with web search and full URL reading:

import anthropic client = anthropic.Anthropic() # Mock search (replace with Tavily/Brave/Serper) def web_search(query: str) -> str: return f"[Mock] Searching: {query}\nResult: Found 3 articles discussing {query}." def read_url(url: str) -> str: return f"[Mock] Full article from {url}: Detailed content about the topic..." TOOLS = [ {"name": "web_search", "description": "Search the web for current information.", "input_schema": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]}}, {"name": "read_url", "description": "Read the full content of a URL for detailed information.", "input_schema": {"type": "object", "properties": {"url": {"type": "string"}}, "required": ["url"]}}, {"name": "finish", "description": "Return the final research synthesis when done gathering information.", "input_schema": {"type": "object", "properties": {"answer": {"type": "string"}}, "required": ["answer"]}}, ] SYSTEM = """You are a deep research agent. When given a research question: 1. First, plan: decompose the question into 3-5 specific sub-questions 2. Search for each sub-question using targeted queries 3. Read full articles from the most relevant URLs 4. Cross-reference and verify key claims across multiple sources 5. When you have comprehensive coverage, call finish() with a well-cited synthesis Always cite sources with URLs in your final answer.""" def agentic_search(question: str, max_steps: int = 15) -> str: messages = [{"role": "user", "content": question}] searches_done = set() urls_read = set() for step in range(max_steps): resp = client.messages.create( model="claude-opus-4-5", max_tokens=3000, system=SYSTEM, tools=TOOLS, messages=messages, ) messages.append({"role": "assistant", "content": resp.content}) if resp.stop_reason == "end_turn": # Model finished without calling finish tool for block in reversed(resp.content): if hasattr(block, "text"): return block.text break tool_results = [] for block in resp.content: if block.type != "tool_use": continue if block.name == "finish": return block.input.get("answer", "Research complete.") elif block.name == "web_search": query = block.input["query"] searches_done.add(query) result = web_search(query) print(f"[search] {query}") tool_results.append({"type": "tool_result", "tool_use_id": block.id, "content": result}) elif block.name == "read_url": url = block.input["url"] urls_read.add(url) content = read_url(url) print(f"[read] {url}") tool_results.append({"type": "tool_result", "tool_use_id": block.id, "content": content}) if tool_results: messages.append({"role": "user", "content": tool_results}) return "Research agent reached step limit." if __name__ == "__main__": result = agentic_search( "What are the key differences between RAG architectures and " "what are the production tradeoffs as of 2025-2026?" ) print(result)
SECTION 06

Agentic vs Standard RAG

Understanding when to use agentic search versus standard RAG is essential for cost-effective system design.

Use standard RAG when: The question has a clear, well-scoped answer. Your knowledge base is comprehensive for the domain. Users need answers in under 2 seconds. The question requires information from your internal documents only. Cost is a primary constraint.

Use Agentic Search when: The question requires synthesizing information from multiple perspectives or sources. The answer is not fully contained in any single document. Currency matters โ€” you need recent web information beyond your training data. The user is doing research, not just fact-lookup. Quality and completeness matter more than latency.

Latency comparison:

Cost comparison (rough):

Hybrid approach: Many production systems combine both. Route simple, well-scoped questions to standard RAG (fast, cheap). Route open-ended research questions to agentic search (thorough, expensive). A router classifier can automate this with ~90% accuracy.
SECTION 07

Production Patterns

Running agentic search reliably in production requires handling failure modes that don't exist in simpler systems.

Query deduplication: Track all queries issued in the current session. If the agent attempts to issue a query it has already run (or a very similar one), return the cached result and note it was already searched. This prevents expensive circular search loops.

Source quality filtering: Not all web sources are equal. Maintain a domain allowlist (for high-stakes research) or blocklist (for known low-quality sources). Pass source metadata to the agent so it can weight sources appropriately.

Conflict resolution: When the agent finds contradictory information, instruct it to explicitly flag the conflict, note the sources holding each position, and assess which is more authoritative rather than silently picking one. This makes the output auditable.

Streaming results: For research tasks taking 1+ minutes, stream intermediate progress to the user ("Searching for X...", "Reading article Y...", "Found 3 sources on Z..."). Users tolerate long waits much better when they can see progress.

Citation format: Enforce citation format in the system prompt. Every factual claim should include an inline citation [Source: URL]. Post-process the final answer to verify all URLs are reachable and map to the claimed content.

Transparency builds trust: Show users what the agent searched for and which sources it relied on. Auditable search traces dramatically increase user confidence in agentic search outputs, especially for high-stakes research.