R1-Searcher++：通过强化学习激励大语言模型的动态知识获取

摘要

大型语言模型（LLMs）虽强大，却因静态知识易产生幻觉。检索增强生成（RAG）通过注入外部信息来缓解此问题，但现有方法往往成本高昂、泛化能力差，或忽视了模型的内部知识。本文提出R1-Searcher++，一种新颖框架，旨在训练LLMs自适应地结合内部与外部知识源。R1-Searcher++采用两阶段训练策略：首先是SFT冷启动阶段，用于初步格式学习；随后是强化学习（RL）阶段，实现动态知识获取。RL阶段利用结果监督鼓励探索，引入奖励机制促进内部知识利用，并整合记忆机制持续吸收检索信息，从而丰富模型的内部知识。通过结合内部知识与外部搜索引擎，模型不断提升其能力，实现高效的检索增强推理。实验表明，R1-Searcher++在RAG及推理方法上均优于以往，实现了高效检索。代码已发布于https://github.com/RUCAIBox/R1-Searcher-plus。

English

Large Language Models (LLMs) are powerful but prone to hallucinations due to static knowledge. Retrieval-Augmented Generation (RAG) helps by injecting external information, but current methods often are costly, generalize poorly, or ignore the internal knowledge of the model. In this paper, we introduce R1-Searcher++, a novel framework designed to train LLMs to adaptively leverage both internal and external knowledge sources. R1-Searcher++ employs a two-stage training strategy: an initial SFT Cold-start phase for preliminary format learning, followed by RL for Dynamic Knowledge Acquisition. The RL stage uses outcome-supervision to encourage exploration, incorporates a reward mechanism for internal knowledge utilization, and integrates a memorization mechanism to continuously assimilate retrieved information, thereby enriching the model's internal knowledge. By leveraging internal knowledge and external search engine, the model continuously improves its capabilities, enabling efficient retrieval-augmented reasoning. Our experiments demonstrate that R1-Searcher++ outperforms previous RAG and reasoning methods and achieves efficient retrieval. The code is available at https://github.com/RUCAIBox/R1-Searcher-plus.

R1-Searcher++：通过强化学习激励大语言模型的动态知识获取

R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

摘要

Support