SECTION 01
What Is Agentic Search?
Agentic Search turns a single-shot query into a multi-step information gathering process. A search agent iteratively formulates queries, evaluates the results, identifies information gaps, and reformulates โ continuing until the information need is fully satisfied or a predefined confidence threshold is met.
The contrast with standard search is stark. A standard search engine (or RAG pipeline) takes one query, returns one result set, and is done. An agentic search agent might issue 5โ20 queries across different sources, follow up on promising leads, compare conflicting information, and synthesize a comprehensive answer that no single search result contains.
The commercial manifestation of this is "deep research" products: Perplexity Deep Research, OpenAI Deep Research, and similar tools that spend minutes (not milliseconds) on a research question to produce a thorough, cited report. These are agentic search systems.
Primary use cases:
- Deep research assistance: Comprehensive literature reviews, competitive analysis, due diligence
- Knowledge synthesis: Combining information from multiple sources with different perspectives
- Monitoring and alerting: Periodically search for changes on a topic and summarize what's new
- Fact verification: Cross-check a claim against multiple independent sources
- Expert Q&A: Answer complex domain questions that require integrating multiple specialized sources
Vs. Agentic RAG: Agentic RAG applies iterative retrieval over a private, indexed knowledge base. Agentic Search typically operates over the open web (using a search API) but the distinction blurs when both patterns are combined.
SECTION 02
The Search Loop
The core loop of an agentic search agent has five components that cycle until completion:
Research question
โ
โโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Search Agent (LLM) โ
โ โ
โ 1. Plan: what do I need to find? โ
โ Decompose into sub-questions โ
โ โ
โ 2. Search: issue targeted query โ
โ โ
โ 3. Evaluate: are results relevant? โ
โ sufficient? contradictory? โ
โ โ
โ 4. Decide: done? or need more? โ
โ If more: reformulate query โ
โ โ
โ 5. Synthesize: write final answer โ
โ with citations โ
โโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Search Tools โ
โ web_search(query) โ
โ read_url(url) โ full text โ
โ news_search(query, date_range) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
The loop terminates when the agent's confidence is high enough, the search budget is exhausted, or the agent explicitly signals it has gathered sufficient evidence.
Plan before you search: Agents that plan sub-questions upfront consistently outperform agents that search reactively. Spend one LLM call decomposing the question into 3โ5 sub-questions before issuing any searches. This focuses the search and prevents circular querying.
SECTION 03
Search Tools
The tools available to the agent determine the quality ceiling of its research. More diverse tools = broader coverage.
Core tools:
- web_search(query, num_results): General web search. Use Tavily, Brave Search API, or Serper (Google Search API wrapper). Tavily is optimized for LLM agents โ it returns clean text extracts rather than raw HTML.
- read_url(url): Fetch and extract the full text content of a URL. Essential for reading past the snippet to get full context.
- news_search(query, from_date, to_date): Search recent news with date filtering. Important for questions with temporal relevance.
Specialized tools for domain research:
- arxiv_search(query): Academic papers in STEM fields. Use for any research question requiring scientific evidence.
- sec_search(query): SEC EDGAR filings. For public company financial research.
- pubmed_search(query): Biomedical literature. For healthcare and life science questions.
- internal_kb_search(query): Your organization's internal documents. Combine with web search for questions mixing internal and external knowledge.
# Tavily search integration (optimized for LLM agents)
# pip install tavily-python
from tavily import TavilyClient
tavily = TavilyClient(api_key="YOUR_KEY")
def web_search(query: str, max_results: int = 5) -> str:
results = tavily.search(
query=query,
max_results=max_results,
include_raw_content=False, # Set True to get full page text
search_depth="advanced" # "basic" is faster/cheaper
)
formatted = []
for r in results.get("results", []):
formatted.append(f"[{r['url']}]\n{r['title']}\n{r['content']}\n")
return "\n---\n".join(formatted) if formatted else "No results found."
SECTION 04
Stopping Criteria
One of the most important design decisions in an agentic search system: when does the agent stop? An agent that stops too early gives incomplete answers. One that never stops exhausts the budget and user patience.
Explicit stopping signals: The cleanest approach is to give the agent a finish(summary) tool that it calls when it decides it has sufficient information. Combine with a max_steps hard limit as a safety net.
Coverage-based stopping: After each round, ask the agent to list remaining information gaps. When the gap list is empty, stop. This is more reliable than asking "are you done?" which LLMs tend to answer affirmatively.
Confidence scoring: Ask the agent to rate its confidence (0โ1) in its current answer at each step. Stop when confidence exceeds a threshold (e.g. 0.85) or when additional search rounds don't increase confidence.
Diminishing returns detection: Track whether each search round added new, non-redundant information. If two consecutive rounds return only information already in the context, stop.
# Stopping with a finish tool
TOOLS = [
{
"name": "web_search",
"description": "Search the web for information.",
"input_schema": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]}
},
{
"name": "finish",
"description": "Call this when you have gathered sufficient information to answer the research question comprehensively. Pass your final synthesized answer.",
"input_schema": {
"type": "object",
"properties": {
"answer": {"type": "string", "description": "Complete, cited research synthesis"},
"confidence": {"type": "number", "description": "Confidence 0.0-1.0 in completeness"}
},
"required": ["answer"]
}
}
]
SECTION 05
Implementation
A complete agentic search agent with web search and full URL reading:
import anthropic
client = anthropic.Anthropic()
# Mock search (replace with Tavily/Brave/Serper)
def web_search(query: str) -> str:
return f"[Mock] Searching: {query}\nResult: Found 3 articles discussing {query}."
def read_url(url: str) -> str:
return f"[Mock] Full article from {url}: Detailed content about the topic..."
TOOLS = [
{"name": "web_search", "description": "Search the web for current information.",
"input_schema": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]}},
{"name": "read_url", "description": "Read the full content of a URL for detailed information.",
"input_schema": {"type": "object", "properties": {"url": {"type": "string"}}, "required": ["url"]}},
{"name": "finish", "description": "Return the final research synthesis when done gathering information.",
"input_schema": {"type": "object", "properties": {"answer": {"type": "string"}}, "required": ["answer"]}},
]
SYSTEM = """You are a deep research agent. When given a research question:
1. First, plan: decompose the question into 3-5 specific sub-questions
2. Search for each sub-question using targeted queries
3. Read full articles from the most relevant URLs
4. Cross-reference and verify key claims across multiple sources
5. When you have comprehensive coverage, call finish() with a well-cited synthesis
Always cite sources with URLs in your final answer."""
def agentic_search(question: str, max_steps: int = 15) -> str:
messages = [{"role": "user", "content": question}]
searches_done = set()
urls_read = set()
for step in range(max_steps):
resp = client.messages.create(
model="claude-opus-4-5",
max_tokens=3000,
system=SYSTEM,
tools=TOOLS,
messages=messages,
)
messages.append({"role": "assistant", "content": resp.content})
if resp.stop_reason == "end_turn":
# Model finished without calling finish tool
for block in reversed(resp.content):
if hasattr(block, "text"):
return block.text
break
tool_results = []
for block in resp.content:
if block.type != "tool_use":
continue
if block.name == "finish":
return block.input.get("answer", "Research complete.")
elif block.name == "web_search":
query = block.input["query"]
searches_done.add(query)
result = web_search(query)
print(f"[search] {query}")
tool_results.append({"type": "tool_result", "tool_use_id": block.id, "content": result})
elif block.name == "read_url":
url = block.input["url"]
urls_read.add(url)
content = read_url(url)
print(f"[read] {url}")
tool_results.append({"type": "tool_result", "tool_use_id": block.id, "content": content})
if tool_results:
messages.append({"role": "user", "content": tool_results})
return "Research agent reached step limit."
if __name__ == "__main__":
result = agentic_search(
"What are the key differences between RAG architectures and "
"what are the production tradeoffs as of 2025-2026?"
)
print(result)
SECTION 06
Agentic vs Standard RAG
Understanding when to use agentic search versus standard RAG is essential for cost-effective system design.
Use standard RAG when: The question has a clear, well-scoped answer. Your knowledge base is comprehensive for the domain. Users need answers in under 2 seconds. The question requires information from your internal documents only. Cost is a primary constraint.
Use Agentic Search when: The question requires synthesizing information from multiple perspectives or sources. The answer is not fully contained in any single document. Currency matters โ you need recent web information beyond your training data. The user is doing research, not just fact-lookup. Quality and completeness matter more than latency.
Latency comparison:
- Standard RAG: 0.5โ2 seconds
- Agentic RAG (internal KB): 5โ30 seconds
- Agentic Search (web): 30 seconds to 5 minutes for deep research
Cost comparison (rough):
- Standard RAG: $0.01โ0.05 per query
- Agentic Search: $0.10โ2.00 per research task depending on depth
Hybrid approach: Many production systems combine both. Route simple, well-scoped questions to standard RAG (fast, cheap). Route open-ended research questions to agentic search (thorough, expensive). A router classifier can automate this with ~90% accuracy.
SECTION 07
Production Patterns
Running agentic search reliably in production requires handling failure modes that don't exist in simpler systems.
Query deduplication: Track all queries issued in the current session. If the agent attempts to issue a query it has already run (or a very similar one), return the cached result and note it was already searched. This prevents expensive circular search loops.
Source quality filtering: Not all web sources are equal. Maintain a domain allowlist (for high-stakes research) or blocklist (for known low-quality sources). Pass source metadata to the agent so it can weight sources appropriately.
Conflict resolution: When the agent finds contradictory information, instruct it to explicitly flag the conflict, note the sources holding each position, and assess which is more authoritative rather than silently picking one. This makes the output auditable.
Streaming results: For research tasks taking 1+ minutes, stream intermediate progress to the user ("Searching for X...", "Reading article Y...", "Found 3 sources on Z..."). Users tolerate long waits much better when they can see progress.
Citation format: Enforce citation format in the system prompt. Every factual claim should include an inline citation [Source: URL]. Post-process the final answer to verify all URLs are reachable and map to the claimed content.
Transparency builds trust: Show users what the agent searched for and which sources it relied on. Auditable search traces dramatically increase user confidence in agentic search outputs, especially for high-stakes research.