ChatPaper.aiChatPaper

學習從智能體軌跡中進行檢索

Learning to Retrieve from Agent Trajectories

March 30, 2026
作者: Yuqi Zhou, Sunhao Dai, Changle Qu, Liang Pang, Jun Xu, Ji-Rong Wen
cs.AI

摘要

傳統資訊檢索系統的設計與訓練向來以人類使用者為中心,其學習排序方法高度依賴大規模的人機互動日誌,例如點擊行為與停留時間。然而隨著大型語言模型驅動的搜尋代理迅速崛起,檢索過程正逐漸轉由代理主體而非人類主導,並被嵌入多輪推理與行動迴路中成為核心組件。在此情境下,基於人類中心假設所訓練的檢索模型,與代理主體發出查詢及消化結果的方式產生了根本性錯位。本研究主張,面向代理化搜尋的檢索模型應直接從代理互動資料中進行訓練。我們提出「基於代理軌跡的學習檢索」新範式,其監督信號源自多步驟的代理互動行為。透過系統性分析搜尋代理軌跡,我們識別出能揭示文件效用的關鍵行為信號,包括瀏覽操作、未瀏覽的拒絕行為以及瀏覽後推理軌跡。基於這些發現,我們提出LRAT框架——透過加權優化機制從代理軌跡中挖掘高品質檢索監督信號,並融入關聯強度評估。在領域內與跨領域深度研究基準測試中,大量實驗表明採用LRAT訓練的檢索器能持續提升證據召回率、端到端任務成功率及執行效率,且適用於不同架構與規模的代理系統。此成果凸顯代理軌跡作為實用且可擴展的監督來源,為代理化搜尋時代的檢索技術指明具潛力的發展方向。
English
Information retrieval (IR) systems have traditionally been designed and trained for human users, with learning-to-rank methods relying heavily on large-scale human interaction logs such as clicks and dwell time. With the rapid emergence of large language model (LLM) powered search agents, however, retrieval is increasingly consumed by agents rather than human beings, and is embedded as a core component within multi-turn reasoning and action loops. In this setting, retrieval models trained under human-centric assumptions exhibit a fundamental mismatch with the way agents issue queries and consume results. In this work, we argue that retrieval models for agentic search should be trained directly from agent interaction data. We introduce learning to retrieve from agent trajectories as a new training paradigm, where supervision is derived from multi-step agent interactions. Through a systematic analysis of search agent trajectories, we identify key behavioral signals that reveal document utility, including browsing actions, unbrowsed rejections, and post-browse reasoning traces. Guided by these insights, we propose LRAT, a simple yet effective framework that mines high-quality retrieval supervision from agent trajectories and incorporates relevance intensity through weighted optimization. Extensive experiments on both in-domain and out-of-domain deep research benchmarks demonstrate that retrievers trained with LRAT consistently improve evidence recall, end-to-end task success, and execution efficiency across diverse agent architectures and scales. Our results highlight agent trajectories as a practical and scalable supervision source, pointing to a promising direction for retrieval in the era of agentic search.
PDF554April 9, 2026