TreeSeeker: 깊이 탐색에서의 트리 구조 기반 시도, 오류, 반환

초록

심층 검색에서는 에이전트가 다단계 웹 검색, 브라우징, 증거 비교 및 종합을 통해 복잡한 질문에 답변해야 한다. 핵심적인 과제는 여러 방향이 그럴듯해 보이지만 일부만이 추후 신뢰할 수 있는 증거로 이어질 때 검색 방법을 결정하는 것이다. 에이전트가 현재 가장 좋아 보이는 방향을 탐욕적으로 추구하면 취약한 연속성을 계속 확장할 수 있다. 규율 없이 탐색하면 예산을 단절된 시행에 낭비할 수 있다. 우리는 심층 검색에서 통제된 시행착오를 위한 추론 시간 프레임워크인 TreeSeeker를 제안한다. TreeSeeker는 검색을 트리 구조 상태에 대한 분기 및 회귀 검색으로 구성하며, 각 분기는 하위 목표에 대한 잠정적 방향이다. 각 라운드에서 TreeSearch는 모든 하위 목표 트리를 읽고 활성 목표를 식별하며, 가치, 불확실성 및 위험에 대한 텍스트 UCB 신호를 사용하여 유망한 분기 활용, 불확실한 대안 탐색, 또는 비생산적인 연속성 가지치기 및 이전 분기점으로 회귀 중에서 선택한다. TreeMem은 증거, 불확실성, 충돌, 진행 상황 및 실패 신호를 이를 생성한 분기에 첨부함으로써 이 제어 루프를 지원하므로, 시행 결과가 이후 결정을 안내할 수 있다. XBench-DeepSearch, BrowseComp 및 BrowseComp-ZH에 대한 실험은 TreeSeeker가 강력한 오픈소스 베이스라인보다 일관되게 우수한 성능을 보여주며, 명시적 분기 및 회귀 제어가 더 강력한 추론 및 도구 실행을 보완함을 시사한다.

English

Deep search requires agents to answer complex questions through multi-step web search, browsing, evidence comparison, and synthesis. A central challenge is deciding how to search when several directions look plausible but only some will later lead to reliable evidence. If an agent greedily follows the current best-looking direction, it may keep extending a weak continuation. If it explores without discipline, it may waste budget on disconnected trials. We propose TreeSeeker, an inference-time framework for controlled trial-and-error in deep search. TreeSeeker organizes search as branch-and-return search over tree-structured states, where each branch is a tentative direction for a sub-goal. At each round, TreeSearch reads all sub-goal trees, identifies active goals, and uses textual UCB signals of value, uncertainty, and risk to select among exploiting a promising branch, exploring an uncertain alternative, or pruning an unproductive continuation and returning to an earlier branch point. TreeMem supports this control loop by keeping evidence, uncertainty, conflicts, progress, and failure cues attached to the branches that produced them, so trial outcomes can guide later decisions. Experiments on XBench-DeepSearch, BrowseComp, and BrowseComp-ZH show that TreeSeeker consistently outperforms strong open-source baselines, suggesting that explicit branch-and-return control complements stronger reasoning and tool execution.