学习从智能体轨迹中检索信息
Learning to Retrieve from Agent Trajectories
March 30, 2026
作者: Yuqi Zhou, Sunhao Dai, Changle Qu, Liang Pang, Jun Xu, Ji-Rong Wen
cs.AI
摘要
传统信息检索系统主要面向人类用户设计训练,其学习排序方法严重依赖点击停留时长等大规模人机交互日志。然而随着大语言模型驱动的搜索智能体迅速崛起,检索过程正日益由智能体而非人类主导,并作为核心组件嵌入多轮推理与行动循环中。在此背景下,基于人类中心假设训练的检索模型与智能体发起查询和消化结果的方式存在根本性错位。本研究提出,面向智能体搜索的检索模型应直接基于智能体交互数据进行训练。我们开创了"从智能体轨迹学习检索"的新范式,其监督信号源自多步骤的智能体交互行为。通过对搜索智能体轨迹的系统分析,我们识别出揭示文档效用的关键行为信号,包括浏览动作、未浏览拒绝行为以及浏览后推理轨迹。基于这些发现,我们提出LRAT框架——通过加权优化融入相关性强度的简易高效方案,可从智能体轨迹中挖掘高质量检索监督信号。在领域内与领域外深度研究基准上的大量实验表明,经LRAT训练的检索模型能持续提升证据召回率、端到端任务成功率及执行效率,且适用于不同架构与规模的智能体。我们的研究成果证实智能体轨迹可作为实用且可扩展的监督源,为智能体搜索时代的检索技术指明了富有前景的发展方向。
English
Information retrieval (IR) systems have traditionally been designed and trained for human users, with learning-to-rank methods relying heavily on large-scale human interaction logs such as clicks and dwell time. With the rapid emergence of large language model (LLM) powered search agents, however, retrieval is increasingly consumed by agents rather than human beings, and is embedded as a core component within multi-turn reasoning and action loops. In this setting, retrieval models trained under human-centric assumptions exhibit a fundamental mismatch with the way agents issue queries and consume results. In this work, we argue that retrieval models for agentic search should be trained directly from agent interaction data. We introduce learning to retrieve from agent trajectories as a new training paradigm, where supervision is derived from multi-step agent interactions. Through a systematic analysis of search agent trajectories, we identify key behavioral signals that reveal document utility, including browsing actions, unbrowsed rejections, and post-browse reasoning traces. Guided by these insights, we propose LRAT, a simple yet effective framework that mines high-quality retrieval supervision from agent trajectories and incorporates relevance intensity through weighted optimization. Extensive experiments on both in-domain and out-of-domain deep research benchmarks demonstrate that retrievers trained with LRAT consistently improve evidence recall, end-to-end task success, and execution efficiency across diverse agent architectures and scales. Our results highlight agent trajectories as a practical and scalable supervision source, pointing to a promising direction for retrieval in the era of agentic search.