野外环境中的主动搜索:基于1400万+真实搜索请求的意图与轨迹动态分析
Agentic Search in the Wild: Intents and Trajectory Dynamics from 14M+ Real Search Requests
January 24, 2026
作者: Jingjie Ning, João Coelho, Yibo Kong, Yunfan Long, Bruno Martins, João Magalhães, Jamie Callan, Chenyan Xiong
cs.AI
摘要
基于大语言模型的搜索代理正日益广泛应用于多步骤信息检索任务,然而信息检索领域对代理式搜索会话的展开方式及检索证据的运用机制仍缺乏实证研究。本文通过对DeepResearchGym(一个供外部代理客户端访问的开源搜索API)收集的1444万次搜索请求(397万个会话)进行大规模日志分析,系统性地揭示了代理搜索的行为特征。我们采用基于LLM的标注方法对日志进行会话划分、会话级意图识别及分步查询重构标注,并提出上下文驱动术语采纳率(CTAR)指标来量化新引入查询词项与既往检索证据的关联程度。分析发现三个显著行为模式:首先,超90%的多轮会话不超过十步操作,89%的步骤间隔在一分钟以内;其次,不同意图的会话呈现差异化特征——事实查询类会话重复率较高且随时间递增,而需推理的会话则保持更广泛的探索范围;第三,代理存在跨步骤证据复用现象,平均54%的新增查询词项出现在累积证据上下文中,且早期步骤的贡献超越最近一次检索结果。这些发现表明,代理搜索可能受益于重复感知的早停机制、意图自适应的检索预算分配以及显式跨步上下文追踪。我们计划发布匿名化日志以支持后续研究。
English
LLM-powered search agents are increasingly being used for multi-step information seeking tasks, yet the IR community lacks empirical understanding of how agentic search sessions unfold and how retrieved evidence is used. This paper presents a large-scale log analysis of agentic search based on 14.44M search requests (3.97M sessions) collected from DeepResearchGym, i.e. an open-source search API accessed by external agentic clients. We sessionize the logs, assign session-level intents and step-wise query-reformulation labels using LLM-based annotation, and propose Context-driven Term Adoption Rate (CTAR) to quantify whether newly introduced query terms are traceable to previously retrieved evidence. Our analyses reveal distinctive behavioral patterns. First, over 90% of multi-turn sessions contain at most ten steps, and 89% of inter-step intervals fall under one minute. Second, behavior varies by intent. Fact-seeking sessions exhibit high repetition that increases over time, while sessions requiring reasoning sustain broader exploration. Third, agents reuse evidence across steps. On average, 54% of newly introduced query terms appear in the accumulated evidence context, with contributions from earlier steps beyond the most recent retrieval. The findings suggest that agentic search may benefit from repetition-aware early stopping, intent-adaptive retrieval budgets, and explicit cross-step context tracking. We plan to release the anonymized logs to support future research.