TreeSeeker: 深層探索における木構造の試行錯誤と戻り

要旨

深層検索では、エージェントが複雑な質問に答えるために、多段階のウェブ検索、ブラウジング、証拠の比較、統合を行う必要がある。中心的な課題は、複数の方向性がもっともらしく見えるが、そのうち一部だけが後に信頼できる証拠につながる場合に、どのように検索を進めるかを決定することである。エージェントが現在最も良さそうに見える方向性に貪欲に従うと、弱い継続を延々と続けてしまう可能性がある。一方、規律なく探索すると、断片的な試行に予算を浪費する可能性がある。我々は、深層検索における制御された試行錯誤のための推論時フレームワークであるTreeSeekerを提案する。TreeSeekerは、検索を木構造の状態に対する分岐と復帰の探索として整理し、各分岐はサブゴールの暫定的な方向性を表す。各ラウンドで、TreeSearchはすべてのサブゴール木を読み取り、アクティブなゴールを特定し、価値、不確実性、リスクのテキストUCBシグナルを用いて、有望な分岐の活用、不確実な代替案の探索、または非生産的な継続の枝刈りと以前の分岐点への復帰の中から選択する。TreeMemは、証拠、不確実性、矛盾、進捗、失敗の手がかりをそれらを生み出した分岐に付随させて保持することで、この制御ループをサポートし、試行の結果が後の意思決定を導くことを可能にする。XBench-DeepSearch、BrowseComp、BrowseComp-ZHでの実験により、TreeSeekerが強力なオープンソースベースラインを一貫して上回ることが示され、明示的な分岐と復帰の制御がより強力な推論とツール実行を補完することが示唆される。

English

Deep search requires agents to answer complex questions through multi-step web search, browsing, evidence comparison, and synthesis. A central challenge is deciding how to search when several directions look plausible but only some will later lead to reliable evidence. If an agent greedily follows the current best-looking direction, it may keep extending a weak continuation. If it explores without discipline, it may waste budget on disconnected trials. We propose TreeSeeker, an inference-time framework for controlled trial-and-error in deep search. TreeSeeker organizes search as branch-and-return search over tree-structured states, where each branch is a tentative direction for a sub-goal. At each round, TreeSearch reads all sub-goal trees, identifies active goals, and uses textual UCB signals of value, uncertainty, and risk to select among exploiting a promising branch, exploring an uncertain alternative, or pruning an unproductive continuation and returning to an earlier branch point. TreeMem supports this control loop by keeping evidence, uncertainty, conflicts, progress, and failure cues attached to the branches that produced them, so trial outcomes can guide later decisions. Experiments on XBench-DeepSearch, BrowseComp, and BrowseComp-ZH show that TreeSeeker consistently outperforms strong open-source baselines, suggesting that explicit branch-and-return control complements stronger reasoning and tool execution.