에이전트 검색을 위한 상호작용 공간 검색을 향하여

초록

검색 에이전트를 위한 검색(retrieval)은 여전히 비에이전트 정보 검색(non-agentic information retrieval) 방식을 계승하고 있다: 검색기가 말뭉치(corpus)를 순위화하고 에이전트는 반환된 소수의 문서만 읽는다. 최근 직접 말뭉치 상호작용(DCI) 연구는 에이전트가 grep 및 파일 읽기와 같은 셸 도구를 통해 원시 말뭉치와 상호작용할 수 있음을 보여준다. 그러나 무제한 상호작용은 확장되지 않는다: 모든 광범위한 셸 명령어는 말뭉치 전체를 스캔하며, 말뭉치가 커질수록 지연 시간이 급격히 증가한다. 우리는 에이전트 검색을 위한 검색의 역할이 단순히 LLM 컨텍스트 윈도우에 맞는 문서를 선택하는 것이 아니라, 상호작용 공간(interaction space)을 구축하는 것이라고 주장한다. 이는 에이전트가 연관 도구를 사용하여 탐색할 수 있는 말뭉치의 경계가 있는 부분집합이다. 이로부터 두 가지 설계 결과가 따른다. 공간은 검색에 의해 제공되는 경계가 필요하며, 그 안의 객체들은 상호작용을 위해 처리되어야 한다. 개념 증명으로, 우리는 RISE(Retrieving Interaction SpacE)를 제안한다: BM25를 사용하여 상호작용 공간을 구축하는 동시에, 해당 문서들은 색인 과정에서 셸 스타일 탐색을 위해 처리된다. BrowseComp-Plus에서 RISE는 순수 셸 기반 DCI 기준선과 gpt-5.4-mini에서 78% 정확도로 일치하며, 쿼리당 비용은 약 4분의 1이다. 100만 문서에서 RISE-BM25는 gpt-5.4-mini에서 81%에 도달하는 반면, gpt-5.4-nano의 DCI는 33건의 벽시계 실패(wall-clock failure)로 60%로 저하된다.

English

Retrieval for search agents is still inherited from non-agentic information retrieval: a retriever ranks the corpus and the agent reads a small set of returned documents. Recent direct corpus interaction (DCI) work shows that agents can instead interact with the raw corpus through shell tools such as grep and file reads. But unbounded interaction does not scale: every broad shell command is a scan over the whole corpus, and latency degrades sharply as the corpus grows. We argue that the role of retrieval for agentic search is not just to select documents that fit in the LLM context window, but to construct an interaction space: a bounded subset of the corpus the agent can explore with associated tools. Two design consequences follow. The space needs a boundary supplied by retrieval, and the objects within it should be processed for interaction. As a proof of concept, we propose RISE (Retrieving Interaction SpacE): we use BM25 to construct the interaction space; meanwhile, its documents are processed during indexing for shell-style navigation. On BrowseComp-Plus, RISE matches the pure-shell DCI baseline at 78% accuracy with gpt-5.4-mini at roughly one quarter of the per-query cost. At 1M documents, RISE-BM25 reaches 81% on gpt-5.4-mini, whereas DCI on gpt-5.4-nano degrades to 60% with 33 of 100 wall-clock failures.