R1-Searcher++：透過強化學習激勵大型語言模型的動態知識獲取

摘要

大型語言模型（LLMs）雖功能強大，卻因靜態知識而易產生幻覺。檢索增強生成（RAG）通過注入外部信息來改善這一問題，但現有方法往往成本高昂、泛化能力差，或忽視了模型的內部知識。本文介紹了R1-Searcher++，這是一個新穎的框架，旨在訓練LLMs自適應地利用內部和外部知識源。R1-Searcher++採用兩階段訓練策略：首先是SFT冷啟動階段，用於初步的格式學習；隨後是RL動態知識獲取階段。RL階段利用結果監督鼓勵探索，引入獎勵機制促進內部知識的利用，並整合記憶機制持續吸收檢索到的信息，從而豐富模型的內部知識。通過結合內部知識與外部搜索引擎，模型不斷提升其能力，實現高效的檢索增強推理。實驗表明，R1-Searcher++在性能上超越了以往的RAG和推理方法，並實現了高效的檢索。代碼已開源於https://github.com/RUCAIBox/R1-Searcher-plus。

English

Large Language Models (LLMs) are powerful but prone to hallucinations due to static knowledge. Retrieval-Augmented Generation (RAG) helps by injecting external information, but current methods often are costly, generalize poorly, or ignore the internal knowledge of the model. In this paper, we introduce R1-Searcher++, a novel framework designed to train LLMs to adaptively leverage both internal and external knowledge sources. R1-Searcher++ employs a two-stage training strategy: an initial SFT Cold-start phase for preliminary format learning, followed by RL for Dynamic Knowledge Acquisition. The RL stage uses outcome-supervision to encourage exploration, incorporates a reward mechanism for internal knowledge utilization, and integrates a memorization mechanism to continuously assimilate retrieved information, thereby enriching the model's internal knowledge. By leveraging internal knowledge and external search engine, the model continuously improves its capabilities, enabling efficient retrieval-augmented reasoning. Our experiments demonstrate that R1-Searcher++ outperforms previous RAG and reasoning methods and achieves efficient retrieval. The code is available at https://github.com/RUCAIBox/R1-Searcher-plus.

R1-Searcher++：透過強化學習激勵大型語言模型的動態知識獲取

R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

摘要

Support