R1-Searcher++:通过强化学习激励大语言模型的动态知识获取
R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning
May 22, 2025
作者: Huatong Song, Jinhao Jiang, Wenqing Tian, Zhipeng Chen, Yuhuan Wu, Jiahao Zhao, Yingqian Min, Wayne Xin Zhao, Lei Fang, Ji-Rong Wen
cs.AI
摘要
大型语言模型(LLMs)虽强大,却因静态知识易产生幻觉。检索增强生成(RAG)通过注入外部信息来缓解此问题,但现有方法往往成本高昂、泛化能力差,或忽视了模型的内部知识。本文提出R1-Searcher++,一种新颖框架,旨在训练LLMs自适应地结合内部与外部知识源。R1-Searcher++采用两阶段训练策略:首先是SFT冷启动阶段,用于初步格式学习;随后是强化学习(RL)阶段,实现动态知识获取。RL阶段利用结果监督鼓励探索,引入奖励机制促进内部知识利用,并整合记忆机制持续吸收检索信息,从而丰富模型的内部知识。通过结合内部知识与外部搜索引擎,模型不断提升其能力,实现高效的检索增强推理。实验表明,R1-Searcher++在RAG及推理方法上均优于以往,实现了高效检索。代码已发布于https://github.com/RUCAIBox/R1-Searcher-plus。
English
Large Language Models (LLMs) are powerful but prone to hallucinations due to
static knowledge. Retrieval-Augmented Generation (RAG) helps by injecting
external information, but current methods often are costly, generalize poorly,
or ignore the internal knowledge of the model. In this paper, we introduce
R1-Searcher++, a novel framework designed to train LLMs to adaptively leverage
both internal and external knowledge sources. R1-Searcher++ employs a two-stage
training strategy: an initial SFT Cold-start phase for preliminary format
learning, followed by RL for Dynamic Knowledge Acquisition. The RL stage uses
outcome-supervision to encourage exploration, incorporates a reward mechanism
for internal knowledge utilization, and integrates a memorization mechanism to
continuously assimilate retrieved information, thereby enriching the model's
internal knowledge. By leveraging internal knowledge and external search
engine, the model continuously improves its capabilities, enabling efficient
retrieval-augmented reasoning. Our experiments demonstrate that R1-Searcher++
outperforms previous RAG and reasoning methods and achieves efficient
retrieval. The code is available at
https://github.com/RUCAIBox/R1-Searcher-plus.Summary
AI-Generated Summary