TreeSeeker：深度搜索中的树结构试错与回溯

摘要

深度搜索要求智能体通过多步网络搜索、浏览、证据对比与综合来回答复杂问题。其中一项核心挑战在于：当多个方向看似合理，但只有部分方向后续能导向可靠证据时，如何决定搜索路径。若智能体贪婪地跟随当前最优方向，可能会持续延伸一条错误的线索；若不加约束地探索，则可能将预算浪费在孤立的尝试上。为此，我们提出TreeSeeker——一个用于深度搜索中受控试错的推理时框架。TreeSeeker将搜索组织为基于树结构状态的分支-返回搜索，其中每个分支对应一个子目标的试探性方向。每轮搜索中，TreeSeeker读取所有子目标树，识别活跃目标，并利用价值、不确定性和风险等文本UCB信号，在以下操作中进行选择：利用一个有前景的分支、探索一个不确定的替代分支、或剪除一条无效线索并返回至先前的分支节点。TreeMem通过将证据、不确定性、冲突、进展和失败线索附着于产生它们的分支上，来支撑这一控制循环，从而使试错结果能够指导后续决策。在XBench-DeepSearch、BrowseComp和BrowseComp-ZH上的实验表明，TreeSeeker始终优于强大的开源基线，证明显式的分支-返回控制能够与更强的推理和工具执行能力形成互补。

English

Deep search requires agents to answer complex questions through multi-step web search, browsing, evidence comparison, and synthesis. A central challenge is deciding how to search when several directions look plausible but only some will later lead to reliable evidence. If an agent greedily follows the current best-looking direction, it may keep extending a weak continuation. If it explores without discipline, it may waste budget on disconnected trials. We propose TreeSeeker, an inference-time framework for controlled trial-and-error in deep search. TreeSeeker organizes search as branch-and-return search over tree-structured states, where each branch is a tentative direction for a sub-goal. At each round, TreeSearch reads all sub-goal trees, identifies active goals, and uses textual UCB signals of value, uncertainty, and risk to select among exploiting a promising branch, exploring an uncertain alternative, or pruning an unproductive continuation and returning to an earlier branch point. TreeMem supports this control loop by keeping evidence, uncertainty, conflicts, progress, and failure cues attached to the branches that produced them, so trial outcomes can guide later decisions. Experiments on XBench-DeepSearch, BrowseComp, and BrowseComp-ZH show that TreeSeeker consistently outperforms strong open-source baselines, suggesting that explicit branch-and-return control complements stronger reasoning and tool execution.