ChatPaper.aiChatPaper

AceSearcher:通过强化自我对弈引导大语言模型的推理与搜索能力

AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play

September 29, 2025
作者: Ran Xu, Yuchen Zhuang, Zihan Dong, Jonathan Wang, Yue Yu, Joyce C. Ho, Linjun Zhang, Haoyu Wang, Wenqi Shi, Carl Yang
cs.AI

摘要

增强搜索功能的大型语言模型(LLMs)在处理复杂推理任务时,常因多跳检索效率低下和推理能力有限而表现不佳。我们提出了AceSearcher,一种协同自对弈框架,该框架训练单一大型语言模型在两种角色间切换:分解者负责拆分复杂查询,而解答者则整合检索到的上下文以生成答案。AceSearcher结合了在多样化搜索、推理及分解任务上的监督微调,以及针对最终答案准确率优化的强化微调,无需中间标注。在涵盖10个数据集的三个推理密集型任务上的广泛实验表明,AceSearcher超越了现有最先进的基线模型,平均精确匹配率提升了7.6%。尤为突出的是,在文档级金融推理任务上,AceSearcher-32B以不足DeepSeek-V3模型5%的参数规模,达到了与之相当的性能。即便在较小规模(1.5B和8B)下,AceSearcher也常常超越参数规模多达9倍的现有搜索增强型LLMs,彰显了其在应对复杂推理任务上的卓越效率与效能。我们的代码将发布于https://github.com/ritaranx/AceSearcher 和 https://huggingface.co/AceSearcher。
English
Search-augmented LLMs often struggle with complex reasoning tasks due to ineffective multi-hop retrieval and limited reasoning ability. We propose AceSearcher, a cooperative self-play framework that trains a single large language model (LLM) to alternate between two roles: a decomposer that breaks down complex queries and a solver that integrates retrieved contexts for answer generation. AceSearcher couples supervised fine-tuning on a diverse mixture of search, reasoning, and decomposition tasks with reinforcement fine-tuning optimized for final answer accuracy, eliminating the need for intermediate annotations. Extensive experiments on three reasoning-intensive tasks across 10 datasets show that AceSearcher outperforms state-of-the-art baselines, achieving an average exact match improvement of 7.6%. Remarkably, on document-level finance reasoning tasks, AceSearcher-32B matches the performance of the DeepSeek-V3 model using less than 5% of its parameters. Even at smaller scales (1.5B and 8B), AceSearcher often surpasses existing search-augmented LLMs with up to 9x more parameters, highlighting its exceptional efficiency and effectiveness in tackling complex reasoning tasks. Our code will be published at https://github.com/ritaranx/AceSearcher and https://huggingface.co/AceSearcher.
PDF62September 30, 2025