ChatPaper.aiChatPaper

邁向檢索互動空間以實現自主式搜尋

Towards Retrieving Interaction Spaces for Agentic Search

June 5, 2026
作者: Shengyao Zhuang, Yuansheng Ni, Hengxin Fun, Jimmy Lin, Xueguang Ma
cs.AI

摘要

搜尋代理的檢索機制仍源自非代理式資訊檢索:檢索器對語料庫進行排序,代理則讀取一小組回傳的文件。近期提出的直接語料互動(DCI)研究顯示,代理可以改為透過 shell 工具(如 grep 與檔案讀取)與原始語料互動。但無限制的互動方式無法擴展:任何寬泛的 shell 指令都需掃描整個語料庫,且隨著語料規模增長,延遲會急遽惡化。我們認為,代理式搜尋中檢索的角色不僅是選出能放入 LLM 上下文視窗的文件,更是為了建構一個互動空間:一個代理能使用關聯工具探索的語料庫有界子集。這引申出兩項設計要求:該空間需由檢索提供邊界,且其中的物件應經過處理以利互動。作為概念驗證,我們提出 RISE(檢索互動空間,Retrieving Interaction SpacE):使用 BM25 建構互動空間;同時在索引階段處理文件以支援 shell 風格的導覽。在 BrowseComp-Plus 上,RISE 搭配 gpt-5.4-mini 達到 78% 的準確率,與純 shell 的 DCI 基線相當,但每次查詢成本約僅四分之一。在 100 萬篇文件規模下,RISE-BM25 搭配 gpt-5.4-mini 達到 81% 準確率,而 DCI 搭配 gpt-5.4-nano 則因 33% 的實時執行失敗而降至 60%。
English
Retrieval for search agents is still inherited from non-agentic information retrieval: a retriever ranks the corpus and the agent reads a small set of returned documents. Recent direct corpus interaction (DCI) work shows that agents can instead interact with the raw corpus through shell tools such as grep and file reads. But unbounded interaction does not scale: every broad shell command is a scan over the whole corpus, and latency degrades sharply as the corpus grows. We argue that the role of retrieval for agentic search is not just to select documents that fit in the LLM context window, but to construct an interaction space: a bounded subset of the corpus the agent can explore with associated tools. Two design consequences follow. The space needs a boundary supplied by retrieval, and the objects within it should be processed for interaction. As a proof of concept, we propose RISE (Retrieving Interaction SpacE): we use BM25 to construct the interaction space; meanwhile, its documents are processed during indexing for shell-style navigation. On BrowseComp-Plus, RISE matches the pure-shell DCI baseline at 78% accuracy with gpt-5.4-mini at roughly one quarter of the per-query cost. At 1M documents, RISE-BM25 reaches 81% on gpt-5.4-mini, whereas DCI on gpt-5.4-nano degrades to 60% with 33 of 100 wall-clock failures.