R1-Searcher++：強化学習によるLLMの動的知識獲得の促進

要旨

大規模言語モデル（LLMs）は強力であるが、静的な知識に起因する幻覚（hallucination）が生じやすい。検索拡張生成（Retrieval-Augmented Generation, RAG）は外部情報を注入することでこれを改善するが、現在の手法はコストが高く、汎化性能が低い、またはモデルの内部知識を無視しがちである。本論文では、LLMが内部知識と外部知識源を適応的に活用するよう訓練するための新しいフレームワーク、R1-Searcher++を提案する。R1-Searcher++は、二段階の訓練戦略を採用している：最初のSFTコールドスタートフェーズで基本的な形式学習を行い、その後、動的知識獲得のための強化学習（RL）を実施する。RLフェーズでは、探索を促進するための結果監視（outcome-supervision）を採用し、内部知識活用のための報酬メカニズムを組み込み、検索された情報を継続的に取り込むための記憶メカニズムを統合することで、モデルの内部知識を豊かにする。内部知識と外部検索エンジンを活用することで、モデルはその能力を継続的に向上させ、効率的な検索拡張推論を可能にする。実験結果は、R1-Searcher++が従来のRAGおよび推論手法を上回り、効率的な検索を実現することを示している。コードはhttps://github.com/RUCAIBox/R1-Searcher-plusで公開されている。

English

Large Language Models (LLMs) are powerful but prone to hallucinations due to static knowledge. Retrieval-Augmented Generation (RAG) helps by injecting external information, but current methods often are costly, generalize poorly, or ignore the internal knowledge of the model. In this paper, we introduce R1-Searcher++, a novel framework designed to train LLMs to adaptively leverage both internal and external knowledge sources. R1-Searcher++ employs a two-stage training strategy: an initial SFT Cold-start phase for preliminary format learning, followed by RL for Dynamic Knowledge Acquisition. The RL stage uses outcome-supervision to encourage exploration, incorporates a reward mechanism for internal knowledge utilization, and integrates a memorization mechanism to continuously assimilate retrieved information, thereby enriching the model's internal knowledge. By leveraging internal knowledge and external search engine, the model continuously improves its capabilities, enabling efficient retrieval-augmented reasoning. Our experiments demonstrate that R1-Searcher++ outperforms previous RAG and reasoning methods and achieves efficient retrieval. The code is available at https://github.com/RUCAIBox/R1-Searcher-plus.

R1-Searcher++：強化学習によるLLMの動的知識獲得の促進

R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

要旨

Support