ChatPaper.aiChatPaper

HierSearch:一個整合本地與網路搜尋的企業級深度搜尋分層框架

HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches

August 11, 2025
作者: Jiejun Tan, Zhicheng Dou, Yan Yu, Jiehan Cheng, Qiang Ju, Jian Xie, Ji-Rong Wen
cs.AI

摘要

近期,大型推理模型展現了強大的數學與編程能力,而深度搜索則利用這些推理能力來應對具有挑戰性的信息檢索任務。現有的深度搜索工作通常局限於單一知識來源,無論是本地還是網絡。然而,企業往往需要能夠同時利用本地和網絡語料庫搜索工具的私有深度搜索系統。直接訓練一個配備多種搜索工具的代理,採用平面強化學習(RL)是一種直觀的想法,但這存在訓練數據效率低下和對複雜工具掌握不足等問題。為解決上述問題,我們提出了一種分層代理深度搜索框架——HierSearch,該框架通過分層RL進行訓練。在底層,訓練一個本地深度搜索代理和一個網絡深度搜索代理,從各自領域檢索證據。在高層,一個規劃代理協調底層代理並提供最終答案。此外,為防止直接答案複製和錯誤傳播,我們設計了一個知識精煉器,過濾掉底層代理返回的幻覺和不相關證據。實驗表明,與平面RL相比,HierSearch在通用、金融和醫療領域的六個基準測試中,均取得了更好的性能,並超越了多種深度搜索和多源檢索增強生成基線。
English
Recently, large reasoning models have demonstrated strong mathematical and coding abilities, and deep search leverages their reasoning capabilities in challenging information retrieval tasks. Existing deep search works are generally limited to a single knowledge source, either local or the Web. However, enterprises often require private deep search systems that can leverage search tools over both local and the Web corpus. Simply training an agent equipped with multiple search tools using flat reinforcement learning (RL) is a straightforward idea, but it has problems such as low training data efficiency and poor mastery of complex tools. To address the above issue, we propose a hierarchical agentic deep search framework, HierSearch, trained with hierarchical RL. At the low level, a local deep search agent and a Web deep search agent are trained to retrieve evidence from their corresponding domains. At the high level, a planner agent coordinates low-level agents and provides the final answer. Moreover, to prevent direct answer copying and error propagation, we design a knowledge refiner that filters out hallucinations and irrelevant evidence returned by low-level agents. Experiments show that HierSearch achieves better performance compared to flat RL, and outperforms various deep search and multi-source retrieval-augmented generation baselines in six benchmarks across general, finance, and medical domains.
PDF263August 13, 2025