エージェンティック検索のためのインタラクション空間の検索に向けて

要旨

検索エージェントのための検索は、依然として非エージェント型情報検索の枠組みを継承している。すなわち、検索器がコーパスを順位付けし、エージェントは返却された少数の文書を読み取る。近年の直接コーパス相互作用（DCI）研究では、エージェントがgrepやファイル読み取りなどのシェルツールを通じて生のコーパスと直接相互作用できることが示されている。しかし、無制限な相互作用はスケールしない。なぜなら、広範なシェルコマンドはコーパス全体のスキャンであり、コーパスが大きくなるにつれてレイテンシが急激に悪化するからである。我々は、エージェント型検索における検索の役割は、LLMのコンテキストウィンドウに収まる文書を選択することだけでなく、相互作用空間（エージェントが関連ツールを用いて探索できるコーパスの有界部分集合）を構築することにあると主張する。これに伴い、二つの設計上の含意が生じる。その空間には検索によって供給される境界が必要であり、またその内部のオブジェクトは相互作用のために処理されるべきである。概念実証として、我々はRISE（Retrieving Interaction SpacE：相互作用空間検索）を提案する。具体的には、BM25を用いて相互作用空間を構築し、同時にその文書をインデックス作成時にシェル形式のナビゲーション向けに処理する。BrowseComp-Plusにおいて、RISEはgpt-5.4-miniで78%の精度を達成し、クエリあたりのコストは純粋シェル型のDCIベースラインの約4分の1である。100万文書の場合、RISE-BM25はgpt-5.4-miniで81%に達するのに対し、gpt-5.4-nano上のDCIは60%に低下し、100件中33件でウォールクロック障害が発生した。

English

Retrieval for search agents is still inherited from non-agentic information retrieval: a retriever ranks the corpus and the agent reads a small set of returned documents. Recent direct corpus interaction (DCI) work shows that agents can instead interact with the raw corpus through shell tools such as grep and file reads. But unbounded interaction does not scale: every broad shell command is a scan over the whole corpus, and latency degrades sharply as the corpus grows. We argue that the role of retrieval for agentic search is not just to select documents that fit in the LLM context window, but to construct an interaction space: a bounded subset of the corpus the agent can explore with associated tools. Two design consequences follow. The space needs a boundary supplied by retrieval, and the objects within it should be processed for interaction. As a proof of concept, we propose RISE (Retrieving Interaction SpacE): we use BM25 to construct the interaction space; meanwhile, its documents are processed during indexing for shell-style navigation. On BrowseComp-Plus, RISE matches the pure-shell DCI baseline at 78% accuracy with gpt-5.4-mini at roughly one quarter of the per-query cost. At 1M documents, RISE-BM25 reaches 81% on gpt-5.4-mini, whereas DCI on gpt-5.4-nano degrades to 60% with 33 of 100 wall-clock failures.