ChatPaper.aiChatPaper

R1-Searcher++:透過強化學習激勵大型語言模型的動態知識獲取

R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

May 22, 2025
作者: Huatong Song, Jinhao Jiang, Wenqing Tian, Zhipeng Chen, Yuhuan Wu, Jiahao Zhao, Yingqian Min, Wayne Xin Zhao, Lei Fang, Ji-Rong Wen
cs.AI

摘要

大型語言模型(LLMs)雖功能強大,卻因靜態知識而易產生幻覺。檢索增強生成(RAG)通過注入外部信息來改善這一問題,但現有方法往往成本高昂、泛化能力差,或忽視了模型的內部知識。本文介紹了R1-Searcher++,這是一個新穎的框架,旨在訓練LLMs自適應地利用內部和外部知識源。R1-Searcher++採用兩階段訓練策略:首先是SFT冷啟動階段,用於初步的格式學習;隨後是RL動態知識獲取階段。RL階段利用結果監督鼓勵探索,引入獎勵機制促進內部知識的利用,並整合記憶機制持續吸收檢索到的信息,從而豐富模型的內部知識。通過結合內部知識與外部搜索引擎,模型不斷提升其能力,實現高效的檢索增強推理。實驗表明,R1-Searcher++在性能上超越了以往的RAG和推理方法,並實現了高效的檢索。代碼已開源於https://github.com/RUCAIBox/R1-Searcher-plus。
English
Large Language Models (LLMs) are powerful but prone to hallucinations due to static knowledge. Retrieval-Augmented Generation (RAG) helps by injecting external information, but current methods often are costly, generalize poorly, or ignore the internal knowledge of the model. In this paper, we introduce R1-Searcher++, a novel framework designed to train LLMs to adaptively leverage both internal and external knowledge sources. R1-Searcher++ employs a two-stage training strategy: an initial SFT Cold-start phase for preliminary format learning, followed by RL for Dynamic Knowledge Acquisition. The RL stage uses outcome-supervision to encourage exploration, incorporates a reward mechanism for internal knowledge utilization, and integrates a memorization mechanism to continuously assimilate retrieved information, thereby enriching the model's internal knowledge. By leveraging internal knowledge and external search engine, the model continuously improves its capabilities, enabling efficient retrieval-augmented reasoning. Our experiments demonstrate that R1-Searcher++ outperforms previous RAG and reasoning methods and achieves efficient retrieval. The code is available at https://github.com/RUCAIBox/R1-Searcher-plus.

Summary

AI-Generated Summary

PDF52May 28, 2025