AceSearcher:透過強化自我對弈引導大型語言模型的推理與搜尋能力
AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play
September 29, 2025
作者: Ran Xu, Yuchen Zhuang, Zihan Dong, Jonathan Wang, Yue Yu, Joyce C. Ho, Linjun Zhang, Haoyu Wang, Wenqi Shi, Carl Yang
cs.AI
摘要
基於搜索增強的大型語言模型(LLMs)在處理複雜推理任務時常面臨挑戰,原因在於多跳檢索效率低下及推理能力有限。我們提出了AceSearcher,這是一種協同自我對弈框架,它訓練單一大型語言模型在兩個角色間切換:一個是將複雜查詢分解的分解器,另一個是整合檢索上下文以生成答案的解答器。AceSearcher結合了對多樣化搜索、推理及分解任務的監督微調,以及針對最終答案準確率優化的強化學習微調,從而無需中間註解。在涵蓋10個數據集的三項推理密集型任務上的廣泛實驗表明,AceSearcher超越了現有最先進的基線模型,平均精確匹配率提升了7.6%。值得注意的是,在文檔級金融推理任務上,AceSearcher-32B以不到DeepSeek-V3模型5%的參數量,達到了與之相當的性能。即使在較小規模(1.5B和8B)下,AceSearcher也經常超越參數量多達9倍的現有搜索增強型LLMs,凸顯了其在解決複雜推理任務上的卓越效率與效能。我們的代碼將發佈於https://github.com/ritaranx/AceSearcher 和 https://huggingface.co/AceSearcher。
English
Search-augmented LLMs often struggle with complex reasoning tasks due to
ineffective multi-hop retrieval and limited reasoning ability. We propose
AceSearcher, a cooperative self-play framework that trains a single large
language model (LLM) to alternate between two roles: a decomposer that breaks
down complex queries and a solver that integrates retrieved contexts for answer
generation. AceSearcher couples supervised fine-tuning on a diverse mixture of
search, reasoning, and decomposition tasks with reinforcement fine-tuning
optimized for final answer accuracy, eliminating the need for intermediate
annotations. Extensive experiments on three reasoning-intensive tasks across 10
datasets show that AceSearcher outperforms state-of-the-art baselines,
achieving an average exact match improvement of 7.6%. Remarkably, on
document-level finance reasoning tasks, AceSearcher-32B matches the performance
of the DeepSeek-V3 model using less than 5% of its parameters. Even at smaller
scales (1.5B and 8B), AceSearcher often surpasses existing search-augmented
LLMs with up to 9x more parameters, highlighting its exceptional efficiency and
effectiveness in tackling complex reasoning tasks. Our code will be published
at https://github.com/ritaranx/AceSearcher and
https://huggingface.co/AceSearcher.